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GLYCOSIDASE ENZYMES 
BACKGROUND OF THE IN VENTION 

1. Field of the Inventions 

This invention relates to newly identified polynucleotides, polypeptides encoded by such 
5 polynucleotides, the use of such polynucleotides and polypeptides, as well as the 
production and isolation of such polynucleotides and polypeptides. More particularly, 
the polynucleotides and polypeptides of the present invention has been putatively 
identified as glucosidases, a-galactosidases, P-galactosidases, B-mannosidases, 6- 
mannanases, endoglucanases, and pullalanases. 

10 2. Description of Related Art 

The glycosidic bond of P-galactosides can be cleaved by different classes of enzymes: 

(i) phospho-P-galactosidases (EC3.2.1.85) are specific for a phosphorylated substrate 
generated via phosphoenolpyruvate phosphotransferase system (PTS)-dependent uptake; 

(ii) typical P-galactosidases (EC 3.2.1.23), represented by the Escherichia coli LacZ 
15 enzyme, which are relatively specific for P-galactosides; and (iii) P-glucosidases (EC 

3.2.1.21) such as the enzymes of Agrobacterium faecalis, Clostridium thermocellum, 
Pyrococcus furiosus or Sulfolobus solfataricus (Day, A.G. and Withers, S.G., (1986) 
Purification and characterization of a p-glucosidase from Alcaligenes faecalis. Can. J. 
Biochem, Cell Biol. 64, 914-922; Kengen, S.W.VL, et al (1993) Eur. J. Biochem,, 213, 

20 305-312; Ait,N., Cruezet,N. and Cattaneo, J. (1982) Properties of p-glucosidase purified 
from Clostridium thermocellum. J. Gen. Microbiol. 128, 569-577; Grogan, D.W, (1991) 
Evidence that p-galactosidase of Sulfolobus solfataricus is only one of several activities 
of a thermostable p-D-glycodiase. Appl. Environ. Microbiol. 57, 1644-1649). Members 
of the latter group, although highly specific with respect to the P-anomeric configuration 

25 of the glycosidic linkage, often display a rather relaxed substrate specificity and 
hydrolyze P-glucosides as well as P-fucosides and P-galactosides. 
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Generally, a-galactosidases are enzymes that catalyze the hydrolysis of galactose groups 
on a polysaccharide backbone or hydrolyze the cleavage of di- or oligosaccharides 
comprising galactose. 

Generally, 6-mannanases are enzymes that catalyze the hydrolysis of mannose groups 
5 internally on a polysaccharide backbone or hydrolyze the cleavage of di- or 
oligosaccaharides comprising mannose groups, fl-mannosidases hydrolyze non-reducing, 
terminal mannose residues on a mannose-containing polysaccharide and the cleavage of 
di- or oligosaccaharides comprising mannose groups. 

Guar gum is a branched galactomannan polysaccharide composed of p-1,4 linked 
10 mannose backbone with a-1,6 linked galactose side chains. The enzymes required for 
the degradation of guar are P-mannanase, P-mannosidase and a-galactosidase. p~ 
mannanase hydrolyses the mannose backbone internally and p-mannosidase hydrolyses 
non-reducing, terminal mannose residues, a-galactosidase hydrolyses a-linked galactose 
groups. 

15 Galactomannan polysaccharides and the enzymes that degrade them have a variety of 
applications. Guar is commonly used as a thickening agent in food and is.utilized in 
hydraulic fracturing in oil and gas recovery. Consequently, galactomannanases are 
industrially relevant for the degradation and modification of guar. Furthermore, a need 
exists for thermostable galactomannases that are active in extreme conditions associated 

20 with drilling and well stimulation. 

There are other applications for these enzymes in various industries, such as in the beet 
sugar industry. 20-30% of the domestic U.S. sucrose consumption is sucrose from sugar 
beets. Raw beet sugar can contain a small amount of raffinose when the sugar beets are 
stored before processing and rotting begins to set in. Raffinose inhibits the 
25 crystallization of sucrose and also constitutes a hidden quantity of sucrose. Thus, there 
is merit to eliminating raffinose from raw beet sugar. a-Galactosidase has also been used 

Z 
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as a digestive aid to break down raffinose, stachyose, and verbascose in such foods as 
beans and other gassy foods. 

P-galactosidases which are active and stable at high temperatures appear to be superior 
enzymes for the production of lactose-free dietary milk products (Chaplin, M.F. and 

5 Bucke, C. (1990) In: Enzyme Technology, pp. 159-160, Cambridge University Press, 
Cambridge, UK). Also, several studies have demonstrated the applicability of p- 
galactosidases to the enzymatic synthesis of oligosaccharides via transglycosylation 
reactions (Nilsson, K.G.L (1988) Enzymatic synthesis of oligosaccharides. Trends 
Biotechnol. 6, 156-264; Cote, G.L. and Tao, B.Y. (1990) Oligosaccharide synthesis by 

10 enzymatic transglycosylation. Glycoconjugate J. 7, 145-162). Despite the commercial 
potential, only a few p-galactosidases of thermophiles have been characterized so far. 
Two genes reported are P-galactoside-cleaving enzymes of the hyperthermophilic 
bacterium Thermotoga maritima, one of the most thermophilic organotrophic eubacteria 
described to date (Huber, R., Langworthy, T.A., Konig, H., Thomm, M., Woese, C.R., 

15 Sleytr, U.B. and Stetter, K.O. (1986) T. martima sp. nov. represents a new genus of 
unique extremely thermophilic eubacteria growing up to 90 °C, Arch. Microbiol. 144, 
324-333) one of the most thermophilic organotrophic eubacteria described to date. The 
gene products have been identified as a p-galactosidase and a P-glucosidase. 

Pullulanase is well known as a debranching enzyme of pullulan and starch. The enzyme 
20 hydrolyzes a-l,6-glucosidic linkages on these polymers. Starch degradation for the 
production or sweeteners (glucose or maltose) is a very important industrial application 
of this enzyme. The degradation of starch is developed in two stages. The first stage 
involves the liquefaction of the substrate with ct-amylase, and the second stage, or 
saccharification stage, is performed by B-amylase with pullalanase added as a 
25 debranching enzyme, to obtain better yields. 

Endoglucanases can be used in a variety of industrial applications. For instance, the 

endoglucanases of the present invention can hydrolyze the internal fi-l,4-glycosidic 

3 
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bonds in cellulose, which may be used for the conversion of plant biomass into fuels and 
chemicals. Endoglucanases also have applications in detergent formulations, the textile 
industry, in animal feed, in waste treatment, and in the fruit juice and brewing industry 
for the clarification and extraction of juices. 
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Brief Description of the Drawings 

The following drawings are illustrative of embodiments of the invention and are not 
meant to limit the scope of the invention as encompassed by the claims. 

Figures la-b are the full-length DNA and corresponding deduced amino acid sequence 
5 of Ml 1TL of the present invention. Sequencing was performed using a 378 automated 
DNA sequencer for all sequences of the present invention (Applied Biosystems, Inc.). 

Figure 2 is an illustration of the full-length DNA and corresponding deduced amino acid 
sequence of OC1/4V-33B/G. 

Figure 3 is an illustration of the full-length DNA and corresponding deduced amino acid 
10 sequence of F1-12G. 

Figures 4a-b are the full-length DNA and corresponding deduced amino acid sequence 
of9N2-31B/G. 

Figures 5a-b are the full-length DNA and corresponding deduced amino acid sequence 
ofMSB8-6G. 

15 Figure 6 is the full-length DNA and corresponding deduced amino acid sequence of 
AEDII12RA-18B/G. 

Figures 7a-b are the full-length DNA and corresponding deduced amino acid sequence 
ofGC74-22G. 

Figures 8a»b are the full-length DNA and corresponding deduced amino acid sequence 
20 ofVCl-7Gl. 

S 
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Figures 9a-c are the full-length DNA and corresponding deduced amino acid sequence 
of37GPl. 

Figures lOa-c are the full-length DNA and corresponding deduced amino acid sequence 
of6GC2. 

5 Figures 1 la-d are the full-length DNA and corresponding deduced amino acid sequence 
of6GP2. 

Figures 12a-c are the full-length DNA and corresponding deduced amino acid sequence 
of63GBl. 

Figures 13a-b are the full-length DNA and corresponding deduced amino acid sequence 
10 ofOCl/4V. 

Figures 14a-e are the full-length DNA and corresponding deduced amino acid sequence 
of6GP3. 

Figures 15a-d are the full-length DNA and corresponding deduced amino acid sequence 
of Thermotoga maritima MSB8-6GP2. 

1 5 Figures 1 6a-c are the full-length DNA and corresponding deduced amino acid sequence 
of Thermotoga maritima MSB8-6GB4. 

Figures 17a-d are the full-length DNA and corresponding deduced amino acid sequence 
of Bankigouldi 37G?4 . 

Figures 18a-b are the full-length DNA and corresponding deduced amino acid sequence 
20 of Pyrococcus furiosus VC1-7EG1 . 
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SUMMARY OF THE INVENTION 

In a preferred embodiment of the present invention, there are provided isolated nucleic 
acids (polynucleotides) which encode mature enzymes having the deduced amino acid 
sequences of Figures 1-18 (SEQ ID NOS : 1 5-28 and 6 1 -64). 

5 In another embodiment, the invention provides a method for producing a polypeptide 
including culturing host cells containing the polynucleotide of Figures 1-18 and 
expressing from the host cell a polypeptide encoded by the polynucleotide and isolating 
the polypeptide. 

In another embodiment, the invention provides an enzyme selected from the group 
1 0 consisting of an enzyme having an amino acid sequence set forth in SEQ ID NOS: 1 5-28 
or 61-64 and an enzyme which has at least 30 consecutive amino acid residue as an 
enzyme having an amino acid sequence set forth in SEQ ID NOS: 15-28 or 61-64. 

In yet another embodiment, the invention provides a method for generating glucose from 
soluble cell oligosaccharides which includes contacting a sample containing 
15 oligosaccharides with an effective amount of an enzyme selected from the group of 
enzymes having the amino acid sequence set forth in SEQ ID NOS: 1 5-28, 61-63 and 64 
such that glucose is produced 

The publications discussed herein are provided solely for their disclosure prior to the 
filing date of the present application. Nothing herein is to be construed as an admission 
20 that the invention is not entitled to antedate such disclosure by virtue of prior invention. 



Definitions 

"Monosaccharide", as used herein, refers to a single polyhydroxy aldehyde or ketone 
unit. 7 
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"Oligosaccharide" as used herein, consist of short chains of monosaccharide units joined 
together by covalent bonds. Of these, the most abundant are the disaccharides, which 
have two monosaccharide units. 

"Polysaccharide", as used herein, consists of long chains having many monosaccharide 
5 units. 

The term "gene" means the segment of DNA involved in producing a polypeptide chain; 
it includes regions preceding and following the coding region (leader and trailer) as well 
as intervening sequences (introns) between individual coding segments (exons). 

A coding sequence is "operably linked to" another coding sequence when RNA 
1 0 polymerase will transcribe the two coding sequences into a single mRN A, which is then 
translated into a single polypeptide having amino acids derived from both coding 
sequences. The coding sequences need not be contiguous to one another so long as the 
expressed sequences ultimately process to produce the desired protein. 

"Recombinant" enzymes refer to enzymes produced by recombinant DNA techniques; 
15 ie. 9 produced from cells transformed by an exogenous DNA construct encoding the 
desired enzyme. "Synthetic" enzymes are those prepared by chemical synthesis. 

A DNA "coding sequence of or a "nucleotide sequence encoding" a particular enzyme, 
is a DNA sequence which is transcribed and translated into an enzyme when placed 
under the control of appropriate regulatory sequences. 

20 Detailed Description of the Invention 

The polynucleotides and polypeptides of the present invention have been identified as 
glucosidases, a-galactosidases, p-galactosidases, B-mannosidases, B-mannanases, 
endoglucanases, and pullalanases as a result of their enzymatic activity. 
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In accordance with one aspect of the present invention, there are provided novel 
enzymes, as well as active fragments, analogs and derivatives thereof. 

In accordance with another aspect of the present invention, there are provided isolated 
nucleic acid molecules encoding the enzymes of the present invention including mRNAs, 
5 cDNAs, genomic DNAs as well as active analogs and fragments of such enzymes. 

In accordance with yet a further aspect of the present invention, there is provided a 
process for producing such polypeptides by recombinant techniques comprising culturing 
recombinant prokaryotic and/or eukaryotic host cells, containing a nucleic acid sequence 
of the present invention, under conditions promoting expression of said enzymes and 
1 0 subsequent recovery of said enzymes. 

In accordance with yet a further aspect of the present invention, there is provided a 
process for utilizing such enzymes, or polynucleotides encoding such enzymes for 
hydrolyzing lactose to galactose and glucose for use in the food processing industry, the 
pharmaceutical industry, for example, to treat intolerance to lactose, as a diagnostic 
1 5 reporter molecule, in corn wet milling, in the fruit juice industry, in baking, in the textile 
industry and in the detergent industry. 

In accordance with yet a further aspect of the present invention, there is provided a 
process for utilizing such enzymes for hydrolyzing guar gum (a galactomannan 
polysaccharide) to remove non-reducing terminal mannose residues. Further 

20 polysaccharides such as galactomannan and the enzymes according to the invention that 
degrade them have a variety of applications. Guar gum is commonly used as a 
thickening agent in food and also is utilized in hydraulic fracturing in oil and gas 
recovery. Consequently, mannanases are industrially relevant for the degradation and 
modification of guar gums. Furthermore, a need exists for thermostable marmases that 

25 are active in extreme conditions associated with drilling and well stimulation. 
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In accordance with yet a further aspect of the present invention, there are also provided 
nucleic acid probes comprising nucleic acid molecules of sufficient length to specifically 
hybridize to a nucleic acid sequence of the present invention. 

In accordance with yet a further aspect of the present invention, there is provided a 
5 process for utilizing such enzymes, or polynucleotides encoding such enzymes, for in 
vitro purposes related to scientific research, for example, to generate probes for 
identifying similar sequences which might encode similar enzymes from other organisms 
by using certain regions, i.e., conserved sequence regions, of the nucleotide sequence. 

These and other aspects of the present invention should be apparent to those skilled in 
10 the art from the teachings herein. 

The polynucleotides of this invention were originally recovered from genomic gene 
libraries derived from the following organisms: 

Ml 1TL is a new species of Desulfurococcus isolated from Diamond Pool in Yellowstone 
National Park. The organism grows optimally at 85-88 °C, pH 7.0 in a low salt medium 
1 5 containing yeast extract, peptone, and gelatin as substrates with a N 2 /C0 2 gas phase. 

OC1/4V is from the genus Thermotoga. The organism was isolated from Yellowstone 
National Park. It grows optimally at 75 °C in a low salt medium with cellulose as a 
substrate and N 2 in gas phase. 

PyrococcusfuriosusWCX and(7EGl) is from the genus Pyrococcus, VC1 was isolated 
20 from Vulcano, Italy. It grows optimally at 100°C in a high salt medium (marine) 
containing elemental sulfur, yeast extract, peptone and starch as substrates and N 2 in gas 
phase. 
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Staphylothermus marinus Fl is a from the genus Staphylothermus. Fl was isolated from 
Vulcano, Italy. It grows optimally at 85 °C, pH 6.5 in high salt medium (marine) 
containing elemental sulfur and yeast extract as substrates and N 2 in gas phase. 

Thermococcus 9N-2 is from the genus Thermococcus 9N-2 was isolated from diffuse 
5 vent fluid in the East Pacific Rise. It is a strict anaerobe that grows optimally at 87°C. 

Thermotoga mahtima MSB 8 and MSB8 (Clone # 6GP2 and 6GB4) is from the genus 
Thermotogo, and was isolated from Vulcano, Italy. MSB8 grows optimally at 85 °C, pH 
6.5 in a high salt medium (marine) containing starch and yeast extract as substrates and 
N 2 in gas phase. 

10 Thermococcus alcaliphilus AEDII12RA is from the genus Thermococcus, AEDII12RA 
grows optimally at 85 °C, pH 9.5 in a high salt medium (marine) containing polysulfides 
and yeast extract as substrates and N 2 in gas phase. 

Thermococcus chitonophagus GC74 is from the genus Thermococcus. GC74 grows 
optimally at 85 °C, pH 6.0 in a high salt medium (marine) containing chitin, meat extract, 
15 elemental sulfur and yeast extract as substrates and N 2 in gas phase. AEPII la grows 
optimally at 85 °C at pH 6.5 in marine medium under anaerobic conditions. It has many 
substrates. Bankia gouldi is from the genus Bankia. 

Accordingly, the polynucleotides and enzymes encoded thereby are identified by the 
organism from which they were isolated, and are sometimes hereinafter referred to as 

20 "M11TL" (Figure 1 and SEQIDNOS:l and 15), "OC1/4V-33B/G" (Figure 2 and SEQ 
ID NOS:2 and 16), "F1-12G" (Figure 3 and SEQ ID NOS:3 and 17), "9N2-31B/G" 
(Figure 4 and SEQ ID NOS:4 and 18), "MSB8" (Figure 5 and SEQ ID NOS:5 and 19), 
" AEDII12RA-1 8B/G" (Figure 6 and SEQ ID NOS:6 and 20), M GC74-22G" (Figure 7 and 
SEQ ID NOS:7 and 21), n VCl-7Gl" (Figure 8 and SEQ ID NOS:8 and 22), M 37GP1 M 

25 (Figure 9 and SEQ ID NOS: 9 and 23), H 6GC2" (Figure 10 and SEQ ID NOS: 10 and 
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24), "6GP2" (Figure 1 1 and SEQ ID NOS:l 1 and 25), "AEPII la" (Figure 12 and SEQ 
ID NOS:12 and 26), "OC1/4V" (Figure 13 and SEQ ID NOS:13 and 27), and "6GP3" 
" (Figure 1 4 and SEQ ID NOS:28), "MSB8-6GP2" (Figure 15 and SEQ ID NOS:57 and 
61), "MSB8-6GB4"(Figure 16 and SEQ ID NOS:58 and 62),"VCl-7EGr(Figure 17 and 
5 SEQ ID NOS:59 and 63), and 37GP4 (Figure 1 8 and SEQ ID NOS:60 and 64). 

The polynucleotides and polypeptides of the present invention show identity at the 
nucleotide and protein level to known genes and proteins encoded thereby as shown in 
Table 1. 

Table 1 











Nucleic 






Gene/Trotein With 


Protein 


Acid 


10 


: ; ; ; : : : Clone 


Closest Homology 


Identity 


Identity 




M11TL-29G 


Sulfolobus sulfataricus 
DSM 1616/P1,P- 
galactosidase 


51% 


55% 




OC1/4V-33B/G 


Caldocellum 
saccharolyticum, p- 
glucosidase 


52% 


57% 




Staphylothermus 


Bacillus polymyxa, P- 


36% 


48% 




marinus F1-12G 


galactosidase 






15 


Thermococcus 9N2- 
31B/G 


Sulfolobus sulfataricu& 
ATCC49255/MT4, p- 
galactosidase 


51% 


50% 




Thermotoga maritima 


Clostridium thermocellum 


45% 


53% 




MSB8-6G 


bglB 








Thermococcus 


Bacillus polymyxa, P- 


34% 


48% 


20 


AEDII12RA-18B/G 


galactosidase 
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Thermococcus 
chitonophagus GC74- 
22G 


Sulfolobus sulfataricus 
ATCC49255/MT4, f$- 
galactosidase 


46% 


54% 


Pyrococcus furiosus 
VC1-7G1 


Sulfolobus 
sulfataricus/MT-4 p- 
galactosidase 


46.4% 


52.5% 


Thermotoga maritima 

a-galactosidase 

(6GC2) 


Pediococcus pentosaceaus 
a-galactosidase 


49% 


29% 


Thermotoga maritima 
B-mannanase (6GP2) 


Aspergillus aculeatus 
mannanase 


56% 


37% 


AEPniaB- 
mannosidase (63GB1) 


Sulfolobus solfactaricus fl- 
galactosidase 


78% 


56% 


OC1/4V 

endoglucanase 

(33GP1) 


Clostridium thermocellum 
endo-1 ? 4~fl-endoglucanase 


65% 


43% 


Thermotoga maritiMalAQ 
pullalanase (6GP3) 


cellum 

saccharolyticum a- 
destrom 6 
glucanohydralase 


72 


53 


Bankia gouldi mix 

Endoglucanase 

(37GP1) 


None available 







The polynucleotides and enzymes of the present invention show homology to each other 
as shown in Table 2. 
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Table 2 



Clone 


Gene/Protein with 
Closest Homology 


Protein : 
Identity 


TSJiiclftic 

Acid 
Identity 


Staphyloihermus 
marinus F1-12G 


Thermococcus 
AEDII12RA-18B/G, p- 
galactosidase, glucosidase 


55% 


57% 


Thermococcus 9N2- 
31B/G 


Thermococcus 
chitonophagus GC74- 
22G-glucosidase" 


74% 


66% 


Pyrococcus furiosus 
VC1-7G1 


Pyrococcus furiosus VC1 - 
7B/GjJ-galactosidase 


46.4% 


54% 



All the clones identified in Tables 1 and 2 encode polypeptides which have cc-glycosidase 
1 0 or p-glycosidase activity. 

This invention, in addition to the isolated nucleic acid molecules encoding the enzymes 
of the present invention, also provide substantially similar sequences. Isolated nucleic 
acid sequences are substantially similar if: (i) they are capable of hybridizing under 
conditions hereinafter described, to the polynucleotides of SEQ ID NOS: 1-14 and 57-60; 

1 5 (ii) or they encode DNA sequences which are degenerate to the polynucleotides of SEQ 
ID NOS: 1-14 and 57-60. Degenerate DNA sequences encode the amino acid sequences 
of SEQ ID NOS: 15-28 and 61-64, but have variations in the nucleotide coding 
sequences. As used herein, substantially similar refers to the sequences having similar 
identity to the sequences of the instant invention. The nucleotide sequences that are 

20 substantially the same can be identified by hybridization or by sequence comparison. 
Enzyme sequences that are substantially the same can be identified by one or more of the 
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following: proteolytic digestion, gel electrophoresis and/or microsequencing. 

One means for isolating the nucleic acid molecules encoding the enzymes of the present 
invention is to probe a gene library with a natural or artificially designed probe using art 
recognized procedures (see, for example; Current Protocols in Molecular Biology, 

5 Ausubel F.M. et ah (EDS.) Green Publishing Company Assoc. and John Wiley 
Interscience, New York, 1989, 1992). It is appreciated to one skilled in the art that the 
polynucleotides of SEQ ID NOS: 1-14 and 57-60 or fragments thereof (comprising at 
least 12 contiguous nucleotides), are particularly useful probes. Other particular useful 
probes for this purpose are hybridizable fragments to the sequences of SEQ ID NOS: 1- 

10 14 and 57-60 (i.e., comprising at least 12 contiguous nucleotides). 

With respect to nucleic acid sequences which hybridize to specific nucleic acid 
sequences disclosed herein, hybridization may be carried out under conditions of reduced 
stringency, medium stringency or even stringent conditions. As an example of 
oligonucleotide hybridization, a polymer membrane containing immobilized denatured 

1 5 nucleic acids is first prehybridized for 30 minutes at 45 °C in a solution consisting of 0.9 
M NaCl, 50mMNaH 2 PO 4 ,pH 7.0, 5.0 mMNa^DTA, 0.5% SDS, 10X Denhardt's, and 
0.5 mg/ml polyriboadenylic acid. Approximately 2 X 10 7 cpm (specific activity 4-9 X 
10 8 cpm/ug) of 32 P end-labeled oligonucleotide probe are then added to the solution. 
After 12-16 hours of incubation, the membrane is washed for 30 minutes at room 

20 temperature in IX SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM 
Na^DTA) containing 0.5% SDS, followed by a 30 minute wash in fresh IX SET at Tm 
10°C for the oligonucleotide probe. The membrane is then exposed to auto-radiographic 
film for detection of hybridization signals. 

Stringent conditions means hybridization will occur only if there is at least 90% identity, 

25 preferably at least 95% identity and most preferably at least 97% identity between the 

sequences. Further, it is understood that a section of a 100 bps sequence that is 95 bps 

in length has 95% identity with the 1 090 bps sequence from which it is obtained. See J. 

IS 
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Sambrook et al, Molecular Cloning, A Laboratory Manual 2d Ed, Cold Spring Harbor 
Laboratory (1989) which is hereby incorporated by reference in its entirety. Also, it is 
understood that a fragment of a 100 bps sequence that is 95 bps in length has 95% 
identity with the 100 bps sequence from which it is obtained. 

5 As used herein, a first DNA (RNA) sequence is at least 70% and preferably at least 80% 
identical to another DNA (RNA) sequence if there is at least 70% and preferably at least 
a 80% or 90% identity, respectively, between the bases of the first sequence and the 
bases of the another sequence, when properly aligned with each other, for example when 
aligned by BLASTN. 

10 "Identity" as the term is used herein, refers to a polynucleotide sequence which 
comprises a percentage of the same bases as a reference polynucleotide (SEQ ID NOS:l- 
14 and 57-60). For example, a polynucleotide which is at least 90% identical to a 
reference polynucleotide, has polynucleotide bases which are identical in 90% of the 
bases which make up the reference polynucleotide and may have different bases in 10% 

15 of the bases which comprise that polynucleotide sequence. 

The present invention relates polynucleotides which differ from the reference 
polynucleotide such that the changes are silent changes, for example the change do not 
alter the amino acid sequence encoded by the polynucleotide. The present invention also 
relates to nucleotide changes which result in amino acid substitutions, additions, 
20 deletions, fusions and truncations in the polypeptide encoded by the reference 
polynucleotide. In a preferred aspect of the invention these polypeptides retain the same 
biological action as the polypeptide encoded by the reference polynucleotide. 

It is also appreciated that such probes can be and are preferably labeled with an 
analytically detectable reagent to facilitate identification of the probe. Useftil reagents 
25 include but are not limited to radioactivity, fluorescent dyes or enzymes capable of 
catalyzing the formation of a detectable product. The probes are thus useful to isolate 

16 
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complementary copies of DNA from other sources or to screen such sources for related 
sequences. 

The polynucleotides of this invention were recovered from genomic gene libraries from 
the organisms listed in Table 1. For example, gene libraries can be generated in the 
5 Lambda ZAP II cloning vector (Stratagene Cloning Systems). Mass excisions can be 
performed on these libraries to generate libraries in the pBluescript phagemid. Libraries 
are thus generated and excisions performed according to the protocols/methods 
hereinafter described. 

The excision libraries are introduced into the E. coli strain BW14893 FkanlA. 
1 0 Expression clones are then identified using a high temperature filter assay. Expression 
clones encoding several glucanases and several other glycosidases are identified and 
repurified. The polynucleotides, and enzymes encoded thereby, of the present invention, 
yield the activities as described above. 

The coding sequences for the enzymes of the present invention were identified by 
1 5 screening the genomic DNAs prepared for the clones having glucosidase or galactosidase 
activity. 

An example of such an assay is a high temperature filter assay wherein expression clones 
were identified by use of high temperature filter assays using buffer Z (see recipe below) 
containing 1 mg/ml of the substrate 5-bromo-4-chloro-3-indolyl-p-D-glucopyranoside 
20 (XGLU) (Diagnostic Chemicals Limited or Sigma) after introducing an excision library 
into the £ coli strain BW14893 FkanlA. Expression clones encoding XGLUases were 
identified and repurified from M11TL, OC1/4V, Pyrococcus ftiriosus VC1, 
Staphylothemus marinus Fl, Thermococcus 9N-2, Thermotoga maritima MSB8, 
Thermococcus alcaliphilus AEDII12RA, and Thermococcus chitonophagus GC74. 

/; 
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Z-buffer: (referenced in Miller, J.H. (1992) A Short Course in Bacterial Genetics, p. 
445.) 

per liter: 

Na2HP0 4 -7H 2 0 16.1g 
5 NaH 2 P0 4 -7H 2 0 5.5g 

KC1 0.75g 
MgS0 4 -7H 2 0 0.246g 
P-mercaptpethanol 2.7ml 
Adjust pH to 7.0 
10 High Temperature Filter Assay 

(1) The f factor fkan (from K coli strain CSH1 18)(1) was introduced into the 
pho-pnh-lac-strain BW14893(2). BW13893(2). The filamentous phage 
library was plated on the resulting strain, BW14893 Fkan. (Miller, J.H. 
(1992) A Short Course in Bacterial Genetics; Lee, K.S., Metcalf, et al., 

15 ( 1 992) Evidence for two phosphonate degradati ve pathways in Enterobacter 

Aerogenes, J. Bacteriol., 174:2501-2510. 

(2) After growth on 100 mm LB plates containing 100 ^ml ampicillin, 80 
|ig/ml nethicillin and ImM IPTG, colony lifts were performed using 
Millipore HATF membrane filters. 

20 (3) The colonies transferred to the filters were lysed with chloroform vapor in 

1 50 mm glass petri dishes. 
(4) The filters were transferred to 1 00 mm glass petri dishes containing a piece 

of Whatman 3MM filter paper saturated with buffer, 

(a) when testing for galactosidase activity (XGALase), 3 MM paper 
25 was saturated with Z buffer containing 1 mg/ml XGAL (ChemBridge 

Corporation). After transferring filter bearing lysed colonies to the 
glass petri dish, placed dish in oven at 80-85 °C. 



it 
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(b) when testing for glucosidase (XGLUase), 3 MM paper was 
saturated with Z buffer containing 1 mg/ml XGLU. After transferring 
filter bearing lysed colonies to the glass petri dish, placed dish in 
oven at 80-85 °C. 

(5) "Positives' were observed as blue spots on the filter membranes. Used the 
following filter rescue technique to retrieve plasmid from lysed positive 
colony. Used pasteur pipette (or glass capillary tube) to core blue spots on 
the filter membrane. Placed the small filter disk in an Eppendorf tube 
containing 20 ^1 water. Incubated the Eppendorf tube at 75 °C for 5 minutes 
followed by vortexing to elute plasmid DNA off filter. This DNA was 
transformed into electrocompetent E. coli cells DH10B for Thermatoga 
maritima MSB8-6G, Staphylothermus marinus F1-12G, Thermococcus 
AEDII12RA-1 8B/G, Thermococcus chitonophagus GC74-22G, Ml 1T1 and 
OC1/4V. Electrocompetent BW14893 F'kanlA E. coli were used for 
Thermococcus 9N2-3 1B/G, and Pyrococcus furiosus VC1-7G1 . Repeated 
filter-lift assay on transformation plates to identify 'positives 1 . Return 
transformation plates to 37 °C incubator after filter lift to regenerate colonies. 
Inoculate 3 ml LB liquid containing 100 ng/ml ampicillin with repurified 
positives and incubate at 37°C overnight. Isolate plasmid DNA from these 
cultures and sequence plasmid insert. In some instances where the plates 
used for the initial colony lifts contained non-confluent colonies, a specific 
colony corresponding to a blue spot on the filter could be identified on a 
regenerated plate and repurified directly, instead of using the filter rescue 
technique. 

Another example of such an assay is a variation of the high temperature filter assay 
wherein colony-laden filters are heat-killed at different temperatures (for example, 105 °C 
for 20 minutes) to monitor thermostability. The 3MM paper is saturated with different 
buffers (i.e., 100 mM NaCl, 5 mM MgCl 2 , 100 mM Tris-Cl (pH 9.5)) to determine 
enzyme activity under different buffer conditions. 
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A p-glucosidase assay may also be employed, wherein GlcppNp is used as an artificial 
substrate (aryl-P-glucosidase). The increase in absorbance at 405 nm as a result of p- 
nitrophenol (pNp) liberation was followed on a Hitachi U-1100 spectrophotometer, 
equipped with a thermostatted cuvette holder. The assays may be performed at 80°C or 
5 90 °C in closed 1-ml quartz cuvette. A standard reaction mixture contains 150 mM 
trisodium substrate, pH 5.0 (at 80°C), and 0,95 mM pNp derivative pNp = 0.561 mM 1 
cm' 1 ). The reaction mixture is allowed to reach the desired temperature, after which the 
reaction is started by injecting an appropriate amount of enzyme (1.06 ml final volume). 

1 U P-glucosidase activity is defined as that amount required to catalyze the formation 
10 of 1 .0 ^mol pNp/min. D-cellobiose may also be used as a substrate. 

An ONPG assay for p-galactosidase activity is described by Miller, J.H. (1992) A Short 
Course in Bacterial Genetics and Mill, J.H. (1992) Experiments in Molecular Genetics, 
the contents of which are hereby incorporated by reference in their entirety. 

A quantitative fluorometric assay for P-galactosidase specific activity is described by : 
15 Youngman P., (1987) Plasmid Vectors for Recovering and Exploiting Tn917 
Transpositions in Bacillus and other Gram-Positive Bacteria. In Plasmids: A Practical 
approach (ed. K. Hardy) pp 79-103. IRL Press, Oxford. A description of the procedure 
can be found in Miller (1992) p. 75-77, the contents of which are incorporated by 
reference herein in their entirety. 

20 The polynucleotides of the present invention may be in the form of DNA which DNA 
includes cDNA, genomic DNA, and synthetic DNA. The DNA may be double-stranded 
or single-stranded, and if single stranded may be the coding strand or non-coding (anti- 
sense) strand. The coding sequences which encodes the mature enzymes may be 
identical to the coding sequences shown in Figures 1-8 (SEQ ID NOS: 1-14 and 57-60) 
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or may be a different coding sequence which coding sequence, as a result of the 
redundancy or degeneracy of the genetic code, encodes the same mature enzymes as the 
" DNA ofFigures 1-18 (SEQ ID NOS: 1-14 and 57-60). 

The polynucleotide which encodes for the mature enzyme of Figures 1-18 (SEQ ID NOS: 
5 1 5-28 and 61-64) may include, but is not limited to: only the coding sequence for the 
mature enzyme; the coding sequence for the mature enzyme and additional coding 
sequence such as a leader sequence or a proprotein sequence; the coding sequence for the 
mature enzyme (and optionally additional coding sequence) and non-coding sequence, 
such as introns or non-coding sequence 5' and/or 3' of the coding sequence for the mature 
10 enzyme. 

Thus, the term "polynucleotide encoding an enzyme (protein)" encompasses a 
polynucleotide which includes only coding sequence for the enzyme as well as a 
polynucleotide which includes additional coding and/or non-coding sequence. 

The present invention further relates to variants of the hereinabove described 
15 polynucleotides which encode for fragments, analogs and derivatives of the enzymes 
havingthe deduced amino acid sequences ofFigures 1-18 (SEQ ID NOS: 15-28 and 61- 
64). The variant of the polynucleotide may be a naturally occurring allelic variant of the 
polynucleotide or a non-naturally occurring variant of the polynucleotide. 

Thus, the present invention includes polynucleotides encoding the same mature enzymes 
20 as shown in Figures 1-18 (SEQ ID NOS: 15-28 and 61-64) as well as variants of such 
polynucleotides which variants encode for a fragment, derivative or analog of the 
enzymes ofFigures 1-18 (SEQ ID NOS: 15-28 and 61-64). Such nucleotide variants 
include deletion variants, substitution variants and addition or insertion variants. 

As hereinabove indicated, the polynucleotides may have a coding sequence which is a 
25 naturally occurring allelic variant of the coding sequences shown in Figures 1-18 (SEQ 
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ID NOS; 1-14 and 57-60). As known in the art, an allelic variant is an alternate form of 
a polynucleotide sequence which may have a substitution, deletion or addition of one or 
more nucleotides, which does not substantially alter the function of the encoded enzyme. 

Fragments of the full length gene of the present invention may be used as a hybridization 
5 probe for a cDNA or a genomic library to isolate the full length DN A and to isolate other 
DNAs which have a high sequence similarity to the gene or similar biological activity. 
Probes of this type preferably have at least 10, preferably at least 15, and even more 
preferably at least 30 bases and may contain, for example, at least 50 or more bases. The 
probe may also be used to identify a DNA clone corresponding to a Ml length transcript 

10 and a genomic clone or clones that contain the complete gene including regulatory and 
promotor regions, exons, and introns. An example of a screen comprises isolating the 
coding region of the gene by using the known DNA sequence to synthesize an 
oligonucleotide probe. Labeled oligonucleotides having a sequence complementary to 
that of the gene of the present invention are used to screen a library of genomic DNA to 

1 5 determine which members of the library the probe hybridizes to. 

The present invention further relates to polynucleotides which hybridize to the 
hereinabove-described sequences if there is at least 70%, preferably at least 90%, and 
more preferably at least 95% identity between the sequences. The present invention 
particularly relates to polynucleotides which hybridize under stringent conditions to the 

20 hereinabove-described polynucleotides. As herein used, the term "stringent conditions" 
means hybridization will occur only if there is at least 95% and preferably at least 97% 
identity between the sequences. The polynucleotides which hybridize to the hereinabove 
described polynucleotides in a preferred embodiment encode enzymes which either retain 
substantially the same biological function or activity as the mature enzyme encoded by 

25 the DNA of Figures 1-18 (SEQ ID NOS: 1-14 and 57-60). 

Alternatively, the polynucleotide may have at least 1 5 bases, preferably at least 30 bases, 

and more preferably at least 50 bases which hybridize to any part of a polynucleotide of 

21 
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the present invention and which has an identity thereto, as hereinabove described, and 
which may or may not retain activity. For example, such polynucleotides may be 
employed as probes for the polynucleotides of SEQ ID NOS: 1-14 and 57-60, for 
example, for recovery of the polynucleotide or as a diagnostic probe or as a PCR primer. 

5 Thus, the present invention is directed to polynucleotides having at least a 70% identity, 
preferably at least 90% identity and more preferably at least a 95% identity to a 
polynucleotide which encodes the enzymes of SEQ ID NOS: 15-28 and 61-64 as well as 
fragments thereof, which fragments have at least 15 bases, preferably at least 30 bases 
and most preferably at least 50 bases, which fragments are at least 90% identical, 
10 preferably at least 95% identical and most preferably at least 97% identical under 
stringent conditions to any portion of a polynucleotide of the present invention. 

The present invention further relates to enzymes which have the deduced amino acid 
sequences of Figures 1-18 (SEQ ID NOS: 15-28 and 61-64) as well as fragments, analogs 
and derivatives of such enzyme. 

15 The terms "fragment," "derivative" and "analog" when referring to the enzymes of 
Figures 1-18 (SEQ ID NOS: 15-28 and 61-64) means enzymes which retain essentially 
the same biological function or activity as such enzymes. Thus, an analog includes a 
proprotein which can be activated by cleavage of the proprotein portion to produce an 
active mature enzyme. 

20 The enzymes of the present invention may be a recombinant enzyme, a natural enzyme 
or a synthetic enzyme, preferably a recombinant enzyme. 

The fragment, derivative or analog of the enzymes of Figures 1-18 (SEQ ID NOS: 15-28 
and 61-64) may be (i) one in which one or more of the amino acid residues are 
substituted with a conserved or non-conserved amino acid residue (preferably a 
25 conserved amino acid residue) and such substituted amino acid residue may or may not 
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be one encoded by the genetic code, or (ii) one in which one or more of the amino acid 
residues includes a substituent group, or (Hi) one in which the mature enzyme is fused 
with another compound, such as a compound to increase the half-life of the enzyme (for 
example, polyethylene glycol), or (iv) one in which the additional amino acids are fused 
5 to the mature enzyme, such as a leader or secretory sequence or a sequence which is 
employed for purification of the mature enzyme or a proprotein sequence. Such 
fragments, derivatives and analogs are deemed to be within the scope of those skilled in 
the art from the teachings herein. 

The enzymes and polynucleotides of the present invention are preferably provided in an 
1 0 isolated form, and preferably are purified to homogeneity. 

The term "isolated" means that the material is removed from its original environment 
(e.g., the natural environment if it is naturally occurring). For example, a naturally- 
occurring polynucleotide or enzyme present in a living animal is not isolated, but the 
same polynucleotide or enzyme, separated from some or all of the coexisting materials 
15 in the natural system, is isolated. Such polynucleotides could be part of a vector and/or 
such polynucleotides or enzymes could be part of a composition, and still be isolated in 
that such vector or composition is not part of its natural environment. 

The enzymes of the present invention include the enzymes of SEQ ID NOS: 1 5-28 and 
61-64 (in particular the mature enzyme) as well as enzymes which have at least 70% 

20 similarity (preferably at least 70% identity) to the enzymes of SEQ ID NOS : 1 5-28 and 
61-64 and more preferably at least 90% similarity (more preferably at least 90% identity) 
to the enzymes of SEQ ID NOS: 15-28 and 61-64 and still more preferably at least 95% 
similarity (still more preferably at least 95% identity) to the enzymes of SEQ ID NOS: 
15-28 and 61-64 and also include portions of such enzymes with such portion of the 

25 enzyme generally containing at least 30 amino acids and more preferably at least 50 
amino acids. 
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As known in the art "similarity" between two enzymes is determined by comparing the 
amino acid sequence and its conserved amino acid substitutes of one enzyme to the 
sequence of a second enzyme. 

A variant, i.e. a "fragment", "analog" or "derivative" polypeptide, and reference 
5 polypeptide may differ in amino acid sequence by one or more substitutions, additions, 
deletions, fusions and truncations, which may be present in any combination. 

Among preferred variants are those that vary from a reference by conservative amino 
acid substitutions. Such substitutions are those that substitute a given amino acid in a 
polypeptide by another amino acid of like characteristics. Typically seen as conservative 
10 substitutions are the replacements, one for another, among the aliphatic amino acids Ala, 
Val, Leu and He; interchange of the hydroxyl residues Ser and Thr, exchange of the 
acidic residues Asp and Glu, substitution between the amide residues Asn and Gin, 
exchange of the basic residues Lys and Arg and replacements among the aromatic 
residues Phe, Tyr. 

1 5 Most highly preferred are variants which retain the same biological function and activity 
as the reference polypeptide from which it varies. 

Fragments or portions of the en2ymes of the present invention may be employed for 
producing the corresponding full-length enzyme by peptide synthesis; therefore, the 
fragments may be employed as intermediates for producing the full-length enzymes. 
20 Fragments or portions of the polynucleotides of the present invention may be used to 
synthesize full-length polynucleotides of the present invention. 

The present invention also relates to vectors which include polynucleotides of the present 
invention, host cells which are genetically engineered with vectors of the invention and 
the production of enzymes of the invention by recombinant techniques. 
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Host cells are genetically engineered (transduced or transformed or transfected) with the 
vectors of this invention which may be, for example, a cloning vector or an expression 
vector. The vector may be, for example, in the form of a plasmid, a viral particle, a 
phage, etc. The engineered host cells can be cultured in conventional nutrient media 
5 modified as appropriate for activating promoters, selecting transformants or amplifying 
the genes of the present invention. The culture conditions, such as temperature, pH and 
the like, are those previously used with the host cell selected for expression, and will be 
apparent to the ordinarily skilled artisan. 

The polynucleotides of the present invention may be employed for producing enzymes 
10 by recombinant techniques. Thus, for example, the polynucleotide may be included in 
any one of a variety of expression vectors for expressing an enryme. Such vectors 
include chromosomal, nonchromosomal and synthetic DNA sequences, e.g., derivatives 
of SV40; bacterial plasmids; phage DNA; baculovirus; yeast plasmids; vectors derived 
from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, 
1 5 fowl pox virus, and pseudorabies. However, any other vector may be used as long as it 
is replicable and viable in the host. 

The appropriate DNA sequence may be inserted into the vector by a variety of 
procedures. In general, the DNA sequence is inserted into an appropriate restriction 
endonuclease site(s) by procedures known in the art. Such procedures and others are 
20 deemed to be within the scope of those skilled in the art. 

The DNA sequence in the expression vector is operatively linked to an appropriate 
expression control sequence(s) (promoter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be mentioned: LTR or SV40 promoter, the E. 
coli. lac or trg, the phage lambda P L promoter and other promoters known to control 
25 expression of genes in prokaryotic or eukaiyotic cells or their viruses. The expression 
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vector also contains a ribosome binding site for translation initiation and a transcription 
terminator. The vector may also include appropriate sequences for amplifying 
expression. 

In addition, the expression vectors preferably contain one or more selectable marker 
5 genes to provide a phenotypic trait for selection of transformed host cells such as 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as 
tetracycline or ampicillin resistance in E. coli , 

The vector containing the appropriate DNA sequence as hereinabove described, as well 
as an appropriate promoter or control sequence, may be employed to transform an 
1 0 appropriate host to permit the host to express the protein. 

As representative examples of appropriate hosts, there may be mentioned: bacterial cells, 
such as E. coli , Streptomvces . Bacillus subtilis : fungal cells, such as yeast; insect cells 
such as Drosophila S2 and Spodoptera Sf9 ; animal cells such as CHO, COS or Bowes 
melanoma; adenoviruses; plant cells, etc. The selection of an appropriate host is deemed 
15 to be within the scope of those skilled in the art from the teachings herein. 

More particularly, the present invention also includes recombinant constructs comprising 
one or more of the sequences as broadly described above. The constructs comprise a 
vector, such as a plasmid or viral vector, into which a sequence of the invention has been 
inserted, in a forward or reverse orientation. In a preferred aspect of this embodiment, 

20 the construct further comprises regulatory sequences, including, for example, a promoter, 
operably linked to the sequence. Large numbers of suitable vectors and promoters are 
known to those of skill in the art, and are commercially available. The following vectors 
are provided by way of example; Bacterial: pQE70, pQE60, pQE-9 (Qiagen), pDIO, 
psiXl 74, pBluescript H KS, pNH8 A, pNHl 6a, pNH 1 8 A, pNH46A (Stratagene); ptrc99a, 

25 pKK223-3 s pKK233-3, pDR540, pRIT5 (Pharmacia); Eukaryotic: pSV2CAT, pOG44, 

Z7 



CI 



WO 98/24799 



PCT/US97/22623 



pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia). However, any other 
plasmid or vector may be used as long as they are replicable and viable in the host. 

Promoter regions can be selected from any desired gene using CAT (chloramphenicol 
transferase) vectors or other vectors with selectable markers. Two appropriate vectors 
5 are pKK232-8 and pCM7. Particular named bacterial promoters include lad, lacZ, T3, 
T7, gpt, lambda P R , P L and tip. Eukaryotic promoters include CMV immediate early, 
HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse 
metallothionein-I. Selection of the appropriate vector and promoter is well within the 
level of ordinary skill in the art. 

10 In a further embodiment, the present invention relates to host cells containing the above- 
described constructs. The host cell can be a higher eukaryotic cell, such as a mammalian 
cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a prokaryotic 
cell, such as a bacterial cell. Introduction of the construct into the host cell can be 
effected by calcium phosphate transfection, DEAE-Dextran mediated transfection, or 

1 5 electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, 
(1986)). 

The constructs in host cells can be used in a conventional manner to produce the gene 
product encoded by the recombinant sequence. Alternatively, the enzymes of the 
invention can be synthetically produced by conventional peptide synthesizers. 

20 Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells 
under the control of appropriate promoters. Cell-free translation systems can also be 
employed to produce such proteins using RNAs derived from the DN A constructs of the 
present invention. Appropriate cloning and expression vectors for use with prokaryotic 
and eukaryotic hosts are described by Sambrook, et al., Molecular Cloning: A Laboratory 

25 Manual, Second Edition, Cold Spring Harbor, N.Y., (1989), the disclosure of which is 
hereby incorporated by reference. 
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Transcription of the DNA encoding the enzymes of the present invention by higher 
eukaryotes is increased by inserting an enhancer sequence into the vector. Enhancers are 
cis-acting elements of DNA, usually about from 10 to 300 bp that act on a promoter to 
increase its transcription. Examples include the SV40 enhancer on the late side of the 
5 replication origin bp 100 to 270, a cytomegalovirus early promoter enhancer, the 
polyoma enhancer on the late side of the replication origin, and adenovirus enhancers. 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting transformation of the host cell, e.g., the ampicillin 
resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a 

10 highly-expressed gene to direct transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic enzymes such as 3- 
phosphoglycerate kinase (PGK), oc-factor, acid phosphatase, or heat shock proteins, 
among others. The heterologous structural sequence is assembled in appropriate phase 
with translation initiation and termination sequences, and preferably, a leader sequence 

15 capable of directing secretion of translated enzyme. Optionally, the heterologous 
sequence can encode a fusion enzyme including an N-terminal identification peptide 
imparting desired characteristics, e.g., stabilization or simplified purification of 
expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a structural DNA 
20 sequence encoding a desired protein together with suitable translation initiation and 
termination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenotypic selectable markers and an origin of replication to 
ensure maintenance of the vector and to, if desirable, provide amplification within the 
host. Suitable prokaryotic hosts for transformation include E. coli . Bacillus subtilis . 
25 Salmonella tvphimurium and various species within the genera Pseudomonas, 
Streptomyces, and Staphylococcus, although others may also be employed as a matter 
of choice. 

*1 
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As a representative but nonlimiting example, useful expression vectors for bacterial use 
can comprise a selectable marker and bacterial origin of replication derived from 
commercially available plasmids comprising genetic elements of the well known cloning 
vector pBR322 (ATCC 37017). Such commercial vectors include, for example, 
5 pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM1 (Promega Biotec, 
Madison, WI, USA). These pBR322 "backbone" sections are combined with an 
appropriate promoter and the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host strain to an 
appropriate cell density, the selected promoter is induced by appropriate means (e.g., 
10 temperature shift or chemical induction) and cells are cultured for an additional period. 

Cells are typically harvested by centrifugation, disrupted by physical or chemical means, 
and the resulting crude extract retained for further purification. 

Microbial cells employed in expression of proteins can be disrupted by any convenient 
method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell 
1 5 lysing agents, such methods are well known to those skilled in the art. 

Various mammalian cell culture systems can also be employed to express recombinant 
protein. Examples of mammalian expression systems include the COS-7 lines of monkey 
kidney fibroblasts, described by Gluzman, Cell, 23:175 (1981), and other cell lines 
capable of expressing a compatible vector, for example, the CI 27, 3T3, CHO, HeLa and 

20 BHK cell lines. Mammalian expression vectors will comprise an origin of replication, 
a suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination 
sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the 
SV40 splice, and polyadenylation sites may be used to provide the required 

25 nontranscribed genetic elements. 
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The enzyme can be recovered and purified from recombinant cell cultures by methods 
including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation 
exchange chromatography, phosphocellulose chromatography, hydrophobic interaction 
chromatography, affinity chromatography, hydroxylapatite chromatography and lectin 
5 chromatography. Protein refolding steps can be used, as necessary, in completing 
configuration of the mature protein. Finally, high performance liquid chromatography 
(HPLC) can be employed for final purification steps. 

The enzymes of the present invention may be a naturally purified product, or a product 
of chemical synthetic procedures, or produced by recombinant techniques from a 
10 prokaryotic or eukaryotic host (for example, by bacterial, yeast, higher plant, insect and 
mammalian cells in culture). Depending upon the host employed in a recombinant 
production procedure, the enzymes of the present invention may be glycosylated or may 
be non-glycosylated. Enzymes of the invention may or may not also include an initial 
methionine amino acid residue. 

1 5 p-galactosidase hydrolyzes lactose to galactose and glucose. Accordingly, the OC1/4V, 
9N2-31B/G, AEDII 1 2RA- 1 8B/G and F1-12G enzymes may be employed in the food 
processing industry for the production of low lactose content milk and for the production 
of galactose or glucose from lactose contained in whey obtained in a large amount as a 
by-product in the production of cheese. Generally, it is desired that enzymes used in 

20 food processing, such as the aforementioned p-galactosidases, be stable at elevated 
temperatures to help prevent microbial contamination. 

These enzymes may also be employed in the pharmaceutical industry. The enzymes are 
used to treat intolerance to lactose. In this case, a thermostable enzyme is desired, as 
well. Thermostable P-galactosidases also have uses in diagnostic applications, where 
25 they are employed as reporter molecules. 

•3/ 
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Glucosidases act on soluble cellooligosaccharides from the non-reducing end to give 
glucose as the sole product. Glucanases (endo- and exo-) act in the depolymerization of 
cellulose, generating more non-reducing ends (endo-glucanases, for instance, act on 
internal linkages yielding cellobiose, glucose and cellooligosaccharides as products). P- 

5 glucosidases are used in applications where glucose is the desired product. Accordingly, 
M11TL, F1-12G, GC74-22G, MSB8-6G , OC1/4V, VC1-7G1, 9N2-31B/G and 
AEDII12RA18B/G may be employed in a wide variety of industrial applications, 
including in corn wet milling for the separation of starch and gluten, in the fruit industry 
for clarification and equipment maintenance, in baking for viscosity reduction, in the 

10 textile industry for the processing of blue jeans, and in the detergent industry as an 
additive. For these and other applications, thermostable enzymes are desirable. 

Antibodies generated against the enzymes corresponding to a sequence of the present 
invention can be obtained by direct injection of the enzymes into an animal or by 
administering the enzymes to an animal, preferably a nonhuman. The antibody so 
15 obtained will then bind the enzymes itself. In this manner, even a sequence encoding 
only a fragment of the enzymes can be used to generate antibodies binding the whole 
native enzymes. Such antibodies can then be used to isolate the enzyme from cells 
expressing that enzyme. 

For preparation of monoclonal antibodies, any technique which provides antibodies 
20 produced by continuous cell line cultures can be used. Examples include the hybridoma 
technique (Kohler and Milstein, 1975, Nature, 256:495-497), the trioma technique, the 
human B-cell hybridoma technique (Kozbor et al, 1983, Immunology Today 4:72), and 
the EBV-hybridoma technique to produce human monoclonal antibodies (Cole, et al, 
1 985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 
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Techniques described for the production of single chain antibodies (U.S. Patent 
4,946,778) can be adapted to produce single chain antibodies to immunogenic enzyme 
products of this invention. Also, transgenic mice may be used to express humanized 
antibodies to immunogenic enzyme products of this invention. 

5 Antibodies generated against the enzyme of the present invention may be used in 
screening for similar enzymes from other organisms and samples. Such screening 
techniques are known in the art, for example, one such screening assay is described in 
"Methods for Measuring Cellulase Activities", Methods in enzymology, Vol 160, pp. 87- 
1 1 6, which is hereby incorporated by reference in its entirety. 

10 The present invention will be further described with reference to the following examples; 
however, it is to be understood that the present invention is not limited to such examples. 
All parts or amounts, unless otherwise specified, are by weight. 

In order to facilitate understanding of the following examples certain frequently 
occurring methods and/or terms will be described. 

1 5 "Plasmids" are designated by a lower case p preceded and/or followed by capital letters 
and/or numbers. The starting plasmids herein are either commercially available, publicly 
available on an unrestricted basis, or can be constructed from available plasmids in 
accord with published procedures. In addition, equivalent plasmids to those described 
are known in the art and will be apparent to the ordinarily skilled artisan. 

20 "Digestion" of DNA refers to catalytic cleavage of the DNA with a restriction enzyme 
that acts only at certain sequences in the DNA. The various restriction enzymes used 
herein are commercially available and their reaction conditions, cofactors and other 
requirements were used as would be known to the ordinarily skilled artisan. For 
analytical purposes, typically 1 \xg of plasmid or DNA fragment is used with about 2 

25 units of enzyme in about 20 jitl of buffer solution. For the purpose of isolating DNA 

3^ 
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fragments for plasmid construction, typically 5 to 50 jig of DNA are digested with 20 to 
250 units of enzyme in a larger volume. Appropriate buffers and substrate amounts for 
particular restriction en2ymes are specified by the manufacturer. Incubation times of 
about 1 hour at 37°C are ordinarily used, but may vary in accordance with the supplier's 
5 instructions. After digestion the reaction is electrophoresed directly on a polyacrylamide 
gel to isolate the desired fragment. 

Size separation of the cleaved fragments is performed using 8 percent polyaciylamide 
gel described by Goeddel, D. et al, Nucleic Acids Res., 8:4057 (1980). 

"Oligonucleotides" refers to either a single stranded polydeoxynucleotide or two 
10 complementary polydeoxynucleotide strands which may be chemically synthesized. 
Such synthetic oligonucleotides have no 5' phosphate and thus will not ligate to another 
oligonucleotide without adding a phosphate with an ATP in the presence of a kinase. A 
synthetic oligonucleotide will ligate to a fragment that has not been dephosphorylated. 

"Ligation" refers to the process of forming phosphodiester bonds between two double 
15 stranded nucleic acid fragments (Maniatis, T.> et al., Id., p. 146). Unless otherwise 
provided, ligation may be accomplished using known buffers and conditions with 10 
units of T4 DNA ligase ("ligase") per 0.5 p.g of approximately equimolar amounts of the 
DNA fragments to be ligated. 

Unless otherwise stated, transformation was performed as described in the method of 
20 Graham, F. and Van der Eb, A., Virology, 52:456-457 (1973). 

Example 1 

Bacterial Expression and Purification of Glvcosidase Enzymes 

DNA encoding the enzymes of the present invention, SEQ ID NOS: 1-14 and 57-60 were 
initially amplified from a pBluescript vector containing the DNA by the PCR technique 
25 using the primers noted herein. The amplified sequences were then inserted into the 
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respective PQE vector listed beneath the primer sequences, and the enzyme was 
expressed according to the protocols set forth herein. The 5' and 3' primer sequences for 
the respective genes are as follows: 

Thermococcus AEDII12RA -1 8B/G 

5 5' CCGAGAATTCATTAAAGAGGAGAAATTAACTATGGTGAATGCTATGATTGTC 3' (SEQ ID NO:29) 
3' CGGAAGATCTTCATAGCTCCGGAAGCCCATA 5' (SEQ ID NO:30) 

Vector: pQEl 2; and contains the following restriction enzyme sites 5' EcoRI and 3' 
Big II. 

OC1/4V-33B/G 

1 0 5' CCGAGAATTCATTAAAGAGGAGAAATTAACTATGATAAGAAGGTCCGATTTTCC 3* 
(SEQIDNO:31) 

3' CGGAAGATCTTTAAGATTTTAGAAATTCCTT 5' (SEQ ID NO:32) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' 
BgllL 

15 Thermococcus 9N2 -31B/G 

5' CCGAGAATTCATTAAAGAGGAGAAATTAACTATGCTACCAGAAGGCTTTCTC 3' 
(SEQIDNO:33) 

3' CGGAGGTACCTCACCCAAGTCCGAACTTCTC 5' (SEQ ID NO:34) 

Vector: pQE30; and contains the following restriction enzyme sites 5' EcoRI and 3* 
20 KpnI. 

Staphylothermus marinus Fl - 12G 

5' CCGAGAATrcAlTAAAGAGGAGAMTTAACTATGATAAGGmCCTGATTAT 3' 
(SEQIDNO:35) 

3' CGGAAGATCTTTATTCGAGGTTCTTTAATCC 5' (SEQ ID NO:36) 

25 Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' 
BgllL 

Thermococcus chitonophagus GC74 - 22G 

5' CCGAGAATTCATTCATTAAAGAGGAGAAATTMCTATGCTTCCAGGAGAACTTTCTC 3' 
(SEQIDNO:37) 

3S 
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3* CGGAGGATCCCTACCCCTCCTCTAAGATCTC 5' (SEQ ID NO:38) 

Vector: pQEl 2; and contains the following restriction enzyme sites 5' EcoRI and 3' 
BaniHI. 

M11TL 

5 5' AATAATCTAGAGCATGCAATTCCCCAAAGACTTCATGATAG 3' (SEQ ID NO:39) 
3' AATAAAAGCTTACTGGATCAGTGTAAGATGCT 5' (SEQ ID NO:40) 

Vector: pQE70; and contains the following restriction enzyme sites 5* Sphl and 3' 
Hind HI. 

Thermotoga maritima MSB8-6G 

1 0 5' CCGACAATTGATTAAAGAGGAGAAATTAACTATGGAAAGGATCGATGAAATT 3' (SEQ ID NO:4 1) 
3' CGGAGGTACCTCATGGTTTGAATCTCTTCTC 5' (SEQ ID NO:42) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' 
KpnI. 

Pyrococcus furiosus VC1 - 7G1 

1 5 5' CCGACAATTGATTAAAGAGGAGAAATTAACTATGTTCCCTGAAAAGTTCCTT 3' (SEQ ID NO:43) 
3' CGGAGGTACCTCATCCCCTCAGCAATTCCTC 5' (SEQ ID NO:44) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' 
KpnI. 

Bankia gouldi endoglucanase (37GP 1 ) 

20 5* AATAAGGATCCGTITAGCGACGCTCGC 3' (SEQ ID NO:45) 

3' AATAAAAGCTTCCGGGTTGTACAGCGGTAATAGGC 5' (SEQ ID NO:46) 

Vector: pQE52; and contains the following restriction enzyme sites 5 1 Bam HI and 3' 
Hind III 



26 
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Thermotoga maritima oc-galactosidase (6GC2) 

5' TTTATTGAATTCATTAAAGAGGAGAAATTAACTATGATCTGTGTGGAAATATTCGGAAAG 3' 
- (SEQIDNO:47) 

3' TCTATAAAGCTTTCATTCTCTCTCACCCTCTTCGTAGAAG 5' (SEQ ID NO;48) 

5 Vector: pQET; and contains the following restriction enzyme sites 5' EcoRI and 3' 
Hindm. 

Thermotoga maritima 8-mannanase (6GP2) 

5' TTTATTCAATTGATTAAAGAGGAGAAATTAACTATGGGGATTGGTGGCGACGAC 3' 
(SEQIDNO:49) 

1 0 3' TTTATTAAGCTTATCTnTCATATTCACATACCTCC 5' (SEQ ID NO:50) 

Vector: pQEt; and contains the following restriction enzyme sites 5 ! Hind III and 3* 
EcoRI. 

AEPIIla B-mannanase(63GBl) 

5* TTTATTGAATTCATTAAAGAGGAGAAATTAACTATGCTACCAGAAGAGTTCCTATGGGGC 3' 
15 (SEQIDNO:51) 

3' TTTATTAAGCTTCTCATCAACGGCTATGGTCTTCATTrC 5' (SEQ ID NO:52) 

Vector: pQEt; and contains the following restriction enzyme sites 5' Hind III and 3' 
EcoRI. 

OC1/4V endoglucanase (33GP1) 
20 5' 

AAAAAACAATTGAATTCATTAAAGAGGAGAAATTAACTATGGTAGAAAGACACTTCAGATATGTTCT 
T3 ( (SEQIDNO:53) 

3' TITrrCGGATCCAATTCTTCATTTACTCTTTGCCTG 5' (SEQ ID NO:54) 

Vector: pQEt; and contains the following restriction enzyme sites 5 1 BamHI and 3' 
25 EcoRI. 

Thermotoga maritima pullalanase (6GP3) 

(SEQIDNO:55) 



37 
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3' ATAAGAAGCTTTTCACTCTCTGTACAGAACGTACGC 5* (SEQ ID NO:56) 

Vector: pQEt; and contains the following restriction enzyme sites 5' EcoRI and 3' 
HindlH 

The restriction enzyme sites indicated correspond to the restriction enzyme sites on the 
5 bacterial expression vector indicated for the respective gene (Qiagen, Inc. Chatsworth, 
CA). The pQE vector encodes antibiotic resistance (Amp 1 ), a bacterial origin of 
replication (ori), an IPTG-regulatable promoter operator (P/O), a ribosome binding site 
(RBS), a 6-His tag and restriction enzyme sites. 

The pQE vector was digested with the restriction enzymes indicated. The amplified 
10 sequences were ligated into the respective pQE vector and inserted in frame with the 
sequence encoding for the RBS. The ligation mixture was then used to transform the E. 
coli strain M15/pREP4 (Qiagen, Inc.) by electroporation. M15/pREP4 contains multiple 
copies of the plasmid pREP4, which expresses the lad repressor and also confers 
kanamycin resistance (Kan 1 ). Transformants were identified by their ability to grow on 
15 LB plates and ampicillin/kanarnycin resistant colonies were selected. Plasmid DNA was 
isolated and confirmed by restriction analysis. Clones containing the desired constructs 
were grown overnight (O/N) in liquid culture in LB media supplemented with both Amp 
(100 ug/ml) and Kan (25 ug/ml). The O/N culture was used to inoculate a large culture 
at a ratio of 1 : 1 00 to 1 :250. The cells were grown to an optical density 600 (O.D. 600 ) of 
20 between 0.4 and 0.6. IPTG ("Isopropyl-B-D-thiogalacto pyranoside") was then added 
to a final concentration of 1 mM. IPTG induces by inactivating the lad repressor, 
clearing the P/O leading to increased gene expression. Cells were grown an extra 3 to 
4 hours. Cells were then harvested by centrifiigation. 

The primer sequences set out above may also be employed to isolate the target gene from 
25 the deposited material by hybridization techniques described above. 
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Example 2 

Isolation of A Selected Clone From the Deposited genomic clones 

A clone is isolated directly by screening the deposited material using the 
oligonucleotide primers set forth in Example 1 for the particular gene desired to be 
5 isolated. The specific oligonucleotides are synthesized using an Applied Biosystems 
DNA synthesizer. The oligonucleotides are labeled with 32 P- -ATP using T4 
polynucleotide kinase and purified according to a standard protocol (Maniatis et al., 
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring, 
NY, 1982). The deposited clones in the pBluescript vectors may be employed to 

1 0 transform bacterial hosts which are then plated on 1 .5% agar plates to the density of 
20,000-50,000 pfu/150 mm plate. These plates are screened using Nylon membranes 
according to the standard screening protocol (Stratagene, 1993). Specifically, the 
Nylon membrane with denatured and fixed DNA is prehybridized in 6 x SSC, 20 mM 
NaH 2 P0 4 , 0.4%SDS, 5 x Denhardt's 500 |ag/ml denatured, sonicated salmon sperm 

15 DNA; and 6 x SSC, 0.1% SDS. After one hour of prehybridization, the membrane is 
hybridized with hybridization buffer 6xSSC, 20 mM NaH 2 P0 4 , 0.4%SDS, 500 ug/ml 
denatured, sonicated salmon sperm DNA with lxlO 6 cpm/ml 32 P-probe overnight at 
42°C. The membrane is washed at 45-50°C with washing buffer 6 x SSC, 0.1% SDS 
for 20-30 minutes dried and exposed to Kodak X-ray film overnight. Positive clones 

20 are isolated and purified by secondary and tertiary screening. The purified clone is 
sequenced to verify its identity to the primer sequence. 

Once the clone is isolated, the two oligonucleotide primers corresponding to the gene 
of interest are used to amplify the gene from the deposited material. A polymerase 
chain reaction is carried out in 25 pi of reaction mixture with 0.5 ug of the DNA of 
25 the gene of interest. The reaction mixture is 1 .5-5 mM MgCl 2 , 0.01% (w/v) gelatin, 
20 \M each of dATP, dCTP, dGTP, dTTP, 25 pmol of each primer and 0.25 Unit of 
Taq polymerase. Thirty five cycles of PCR (denaturation at 94°C for 1 min; 
annealing at 55 °C for 1 min; elongation at 72°C for 1 min) are performed with the 
Perkin-Elmer Cetus automated thermal cycler. The amplified product is analyzed by 
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agarose gel electrophoresis and the DNA band with expected molecular weight is 
excised and purified. The PCR product is verified to be the gene of interest by 
subcloning and sequencing the DNA product. The ends of the newly purified genes 
are nucleotide sequenced to identify full length sequences. Complete sequencing of 
5 full length genes is then performed by Exonuclease III digestion or primer walking. 

Example 3 
Screening for Galactosidase Activity 

Screening procedures for a-galactosidase protein activity may be assayed for as 
follows: 

1 0 Substrate plates were provided by a standard plating procedure. Dilute 

XLl-Blue MRF Ecoli host of (Stratagene Cloning Systems, La Jolla, CA) to O.D. 600 
= 1.0 with NZY media. In 15 ml tubes, inoculate 200 /A diluted host cells with phage. 
Mix gently and incubate tubes at 37 °C for 15 min. Add approximately 3.5 ml LB top 
agarose (0.7%) containing ImM IPTG to each tube and pour onto all NYZ plate 

1 5 surface. Allow to cool and incubate at 37 °C overnight. The assay plates are 

obtained as substrate p-Nitrophenyl a-galactosidase (Sigma) (200 mg/100 ml) (100 
mM NaCl, 1 00 mM Potassium-Phosphate) 1% (w/v) agarose. The plaques are 
overlayed with nitrocellulose and incubated at 4 °C for 30 minutes whereupon the 
nitrocellulose is removed and overlayed onto the substrate plates. The substrate 

20 plates are then incubated at 70 °C for 20 minutes. 

Example 4 

Screening of Clones for Mannanase Activity 

A solid phase screening assay was utilized as a primary screening method to test 
clones for fl-mannanase activity. 

25 A culture solution of the Y1090-& coli host strain (Stratagene Cloning Systems, La 
Jolla, CA) was diluted to O.D. 600 =1.0 with NZY media. The amplified library from 
Thermotoga maritima lambda gtll library was diluted in SM (phage dilution buffer): 
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5 x 10 7 pfu/nl diluted 1:1000 then 1:100 to 5 x 10 2 pfu/nl. Then 8 ul of phage 
dilution (5 x 1 0 2 pfu/ul) was plated in 200 jil host cells. They were then incubated in 
1 5 ml tubes at 37 °C for 1 5 minutes. 

Approximately 4 ml of molten, LB top agarose (0.7%) at approximately 52 °C was 
5 added to each tube and the mixture was poured onto the surface of LB agar plates. 
The agar plates were then incubated at 37 °C for five hours. The plates were 
replicated and induced with 10 mM IPTG-soaked Duralon-UV™ nylon membranes 
(Stratagene Cloning Systems, La Jolla, CA) overnight. The nylon membranes and 
plates were marked with a needle to keep their orientation and the nylon membranes 
1 0 were then removed and stored at 4 ° C. 

An Azo-galactomannan overlay was applied to the LB plates containing the lambda 
plaques. The overlay contains 1% agarose, 50 mM potassium-phosphate buifer pH 7, 
0.4% Azocarob-galactomannan. (Megazyme, Australia). The plates were incubated 
at 72 °C. The Azocarob-galactomannan treated plates were observed after 4 hours 
1 5 then returned to incubation overnight. Putative positives were identified by clearing 
zones on the Azocarob-galactomannan plates. Two positive clones were observed. 

The nylon membranes referred to above, which correspond to the positive clones 
were retrieved, oriented over the plate and the portions matching the locations of the 
clearing zones for positive clones wre cut out. Phage was eluted from the membrane 
20 cut-out portions by soaking the individual portions in 500 \il SM (phage dilution 
buffer) and 25 nlCHCl 3 . 

Example 5 

Screening of Clones for Mannosidase Activity 

A solid phase screening assay was utilized as a primary screening method to test 
25 clones for fi-mannosidase activity. 

W 
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A culture solution of the Y1090-E coli host strain (Stratagene Cloning Systems, La 
Jolla,CA) was diluted to O.D. 600 =1.0 with NZY media. The amplified library from 
AEPII la lambda gtll library was diluted in SM (phage dilution buffer): 5 x 10 7 
pfu/nl diluted 1:1000 then 1:100 to 5 x 10 2 pfu/fil. Then 8 \il of phage dilution 
5 (5 x 10 2 pfu/nl) was plated in 200 jxl host cells. They were then incubated in 15 ml 
tubes at 37 °C for 1 5 minutes. 

Approximately 4 ml of molten, LB top agarose (0.7%) at approximately 52 °C was 
added to each tube and the mixture was poured onto the surface of LB agar plates. 
The agar plates were then incubated at 37 °C for five hours. The plates were 
1 0 replicated and induced with 1 0 mM IPTG-soaked Duralon-UV™ nylon membranes 
(Stratagene Cloning Systems, La Jolla, CA) overnight. The nylon membranes and 
plates were marked with a needle to keep their orientation and the nylon membranes 
were then removed and stored at 4 °C. 

A p-nitrophenyl-B-D-manno-pyranoside overlay was applied to the LB plates 
1 5 containing the lambda plaques. The overlay contains 1% agarose, 50 mM potassium- 
phosphate buffer pH 7, 0.4% p-nitrophenyl-B-D-manno-pyranoside. (Megazyme, 
Australia). The plates were incubated at 72 °C. The p-nitrophenyl-B-D-manno- 
pyranoside treated plates were observed after 4 hours then returned to incubation 
overnight. Putative positives were identified by clearing zones on the p-nitrophenyl- 
20 6-D-manno-pyranoside plates. Two positive clones were observed. 

The nylon membranes referred to above, which correspond to the positive clones 
were retrieved, oriented over the plate and the portions matching the locations of the 
clearing zones for positive clones wre cut out. Phage was eluted from the membrane 
cut-out portions by soaking the individual portions in 500 \i\ SM (phage dilution 
25 buffer) and 25 pi CHC1 3 . 
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Example 6 
Screening for Pullulanase Activity 

Screening procedures for pullulanase protein activity may be assayed for as follows: 
Substrate plates were provided by a standard plating procedure. Host cells 
5 are diluted to O.D. 600 = 1 .0 with NZY or appropriate media. In 1 5 ml tubes, inoculate 
200 iA diluted host cells with phage. Mix gently and incubate tubes at 37 °C for 1 5 
min. Add approximately 3 .5 ml LB top agarose (0.7%) is added to each tube and the 
mixture is plated, allowed to cool, and incubated at 37°C for about 28 hours. 
Overlays of 4.5 mis of the following substrate are poured: 

10 100 ml total volume 



0.5g Red Pullulan Red (Megazyme, Australia) 

LOg Agarose 

5ml Buffer (Tris-HCL pH 7.2 @ 75 °C) 

2ml 5MNaCl 
15 5ml CaCl 2 (100mM) 

85ml dH 2 0 
Plates are cooled at room temperature, and thenm incubated at 75 °C for 2 hours. 
Positives are observed as showing substrate degradation. 

Example 7 

20 Screening for Endoglucanase Activity 



Screening procedures for endoglucanase protein activity may be assayed for as 
follows: 

1 . The gene library is plated onto 6 LB/GelRite/0. 1 % CMC/NZY agar plates 

25 (-4,800 plaque forming units/plate) in E.coli host with LB agarose as top agarose. 
The plates are incubated at 37 °C overnight. 

2. Plates are chilled at 4°C for one hour. 

3 . The plates are overlayed with Duralon membranes (Stratagene) at 

93 
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room temperature for one hour and the membranes are oriented and lifted off the 
plates and stored at 4°C. 

4. The top agarose layer is removed and plates are incubated at 37 °C 
for -3 hours. 

5 5. The plate surface is rinsed with NaCl. 

6. The plate is stained with 0.1% Congo Red for 1 5 minutes. 

7. The plate is destained with 1M NaCl. 

8. The putative positives identified on plate are isolated from the 
Duralon membrane (positives are identified by clearing zones around clones). The 

10 phage is eluted from the membrane by incubating in 500^1 SM + 25|il CHC1 3 to elute. 

9. Insert DNA is subcloned into any appropriate cloning vector and 
subclones are reassayed for CMCase activity using the following protocol: 

i) Spin 1 ml overnight miniprep of clone at maximum speed 

for 3 minutes. 

1 5 ii) Decant the supernatant and use it to fill "wells" that have 

been made in an LB/GelRite/0. 1 % CMC plate. 

iii) Incubate at 37°C for 2 hours. 

iv) Stain with 0.1% Congo Red for 1 5 minutes. 

v) Destain with 1 M NaCl for 1 5 minutes. 

20 vi) Identify positives by clearing zone around clone. 

Numerous modifications and variations of the present invention are possible in light 
of the above teachings and, therefore, within the scope of the appended claims, the 
invention may be practiced otherwise than as particularly described. 
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WHAT IS CLAIMED IS : 

1 . An isolated polynucleotide selected from the group consisting of: 

(a) SEQ ID NOS: 1-14 and 57-60; 

(b) SEQ ID NOS: 1-14 and 57-60, wherein T can also be U; 

(c) polynucleotide sequences complementary to SEQ ID NOS : 1-14 
and 57- 60; 

(d) polynucleotide sequences which encode an amino acid sequence as 
set forth in SEQ ID NOS: 1 5-28, and 61-64; and 

(e) fragments of (a), (b), (c) or (d) that are at least 1 5 consecutive 
bases in length and that will selectively hybridize to DNA which 
encodes a polypeptide of SEQ ID NOS:15-28, and 61-64. 

2. A vector comprising a polynucleotide of claim 1. 

3. A host cell containing the vector of claim 2. 



4. The method of claim 3, wherein the host cell is a eukaryotic cell. 

5. The method of claim 3, wherein the host cell is a prokaryotic cell. 

6. A method for producing a polypeptide comprising: 

(a) culturing the host cells of claim 3; 

(b) expressing from the host cell of claim 3 a polypeptide encoded by 
said polynucleotide; and 

(c) isolating the polypeptide. 
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7. An enzyme selected from the group consisting of: 

(a) an enzyme comprising an amino acid sequence set forth in SEQ ID 
NOS: 15-28 or 61-64; and 

(b) an enzyme which comprises at least 30 consecutive amino acid 
residue as an enzyme of (a). 

8. An enzyme of which at least a portion is coded for by a polynucleotide of 
claim 1 , and which is selected from the group consisting of: 

(a) an enzyme comprising an amino acid sequence which is at least 
70% identical to an amino acid sequence selected from the group 
of amino acid sequences set forth in SEQ ID NOS:15-28 or 61-64; 
and 

(b) an enzyme which comprises at least 30 amino acid residues to the 
enzyme of (a). 

9. A method for generating glucose from soluble cell oligosaccharides 
comprising contacting a sample containing oligosaccharides with an 
effective amount of an enyzme selected from the group consisting of an 
enzyme having the amino acid sequence set forth in SEQ ID NOS: 15-28, 
6 1 -63 and 64 such that glucose is produced. 

1 0. The method of cliam 9, wherein the sample is selected from the group 
consisting of dairy products, fruit juices, detergents, textiles, guar gum, 
animal feed, plant biomass and waste products. 

1 1 . The method of claim 9, wherein the oligosaccharide is selected from the 
group consisting of maltose, cellobiose, lactose, sucrose, raffinose, 
stachyose, verbascose, cellulose, starch, amylose, glycogen, disacharrides, 
polysacharrides and pullulan. 
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1131 ccc cct tac cxc ctc acc ccc tac Cro tac tco ccc era jlcc cw wc tac cw toj ccc uep. 

441 Al« CVy Tyr A*p V»l Arg Gty ?yr ttu Tyt Trp AU L*u Thr Alp Asn Tyr Clu Ttp AU 4*0 

.311 CTC CCt TTC AQG ATG XCC TTC COC CTC TAT AAA CTC CAT CTC ATX ACC AAC GAG A6A ACA 1440 

441 Imi Cly fh« Arg *«c Axg »h« Gly l» v Tyr tyx Vol Amp L«u lim Thr L/i Olu Ar? fhx 480 

1441 CCC COC GAG GAA ACC CTA AAC GIT TAT XCC CCC ATC CTC CAC AAC AAC CCA CTC AOC AAC 1500 

481 fro Arg Clu Clu Stx V*l Ly« v»l Tyr Atq Ciy Uu Vol Olu Ajo a*i Cly v*l ser Lyi 300 

1501 CAA ATC COC CAS WvO TTC CCA CTT CflG TCA 1510 

^01 Clu II* Aro Clu Lya Pba Gly -*u Oly *nd UQ 
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1201 GAG GAG TAC ATA AAA A AO ATG A**»A (iAA ACA GAG 
401 Clu Clu Tyr Ik Lys Ly* Mel Arc Glu Thr C,w 

- (2M GCA ACG CTC ATA AAA CCG AAA CTC CCA CAC AAT 

421 Cty Thr Vnt lie Lys Pn» Ly* Uu Pm Glu Aw 

1321 CCT CCA AAC AAA AAC CAT CTT CCA CTT CTT CTC 

441 Pt» Pro Lys lys Asn Aip Val AU Val Val Val 

081 CAC ACA AAC CCG CTC AAA CCT CAC TTC TAC CTC 

461 Aip Arg lys Pro Val Ly* Cly Asp Phe Tyr Uu 

1441 ACC CTC TCO AAA CAA TTC CAC CAT CAG CCT AAC 

481 Thr Vat Ser Lys Gtu Phe His Asp Gin G)y Lyi 

1301 ACT CCC ATC CAA CTC GCA AGC TGG AGA 1 GAC CTT 

5Q1 Ser Pro lie Clu Vil AU Scr Trp Ar| Asp Leu 

15*1 GCO GGA CAG GAG ATC GCA AGA ATA CTG CCC CAT 

521 AU GJy Gin Clu Met Gly Arg lie Val AU Asp 

162 1 GGA AAA CTT CCA ACG ACC TTC CCG AAG GAT TAC 

541 Gly Lyi Uu Pro Thr Thr Phc Pro Lyi Aip Tyr 

I Ml GGA GAG CCA AAG GAC AAT CCG CAA AGA CTG GTG 

361, Gly Glu Pro Lyi Aap Am Pro Gift Ae% Vat Vat 
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38 1 Arg Tyr Tyr Asp Thr Phc Cly V*| Glu Pro AU 

I BO I ACA AAG TTT GAA TAC AAA GAT TTA AAA ATC GCT 

601 Thr Lyi Phe Glu Tyr Lyi Asp Uu Lys he Ala 

I86J TAC ACG ATC ACA AAC ACT GOG CAC AGA GCT GGA 

621 Tyr Thr lie Thr Am Thr Cly Asp Aft, AU Cly 
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2161 CCA TGA 2166 
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TBEHHOCOCCOS AEDII12RA OLYCOflXDASE (18B/C) 
COMPLETE OEMS SEQDZKCX - 9/93 
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THERMOCOCCDfl CHITONOPHAGOS CLYCOSIDASE - 22Q 
COMPLETE SEQUENCE - 9/95 

1 TTC CTT CCA CAG AAC m CTC TGC CCA CTT TCA CAC TCC CCA TTC CAC TTT CAA ATG 60 

1 net Ley Pro Glu Asn Phe Leu Trp Cly Vat Set Cln Ser Cly Phe Cln phe Glu Met Cly 20 

61 GAC AGA CTC AGG ACG CAC ATT GAT CCA AAC ACA CAT TCC TCG TAC TGC GTA ACA GAT CAA 120 

21 Asp Arg Uu Arg Arg His II* Asp Pro Asn Thr Asp Trp Trp Tyr Trp Val Arg Asp Ciu 40 

121 TAT AAT ATC AAA AAA CCA CTA GTA ACT CCC CAT CTT CCC CAA CAC CCT ATA AAT TCA TAT 100 

41 Tyr Asn He Lys Lys Cly Leu Val Ser Cly Asp Leu Pro Glu Asp Cly He Asn Ser Tyr 60 

181 CAA TTA TAT CAC AGA GAC CAA CAA ATT CCA AAC CAT TTA GCC CTC AAC ACA TAT AGG ATC 240 

61 Ciu Leu Tyr Glu Arg Asp Cln Clu He Ala Lys Asp Leu Cly Leu Asn Thr Tyr Arg He 80 

241 CCA ATT CAA TCC ACC ACA CTA TTT CCA TGC CCA ACG ACT TTT CTC CAC CTC GAC TAT CAA 300 

81 Cly II* CW Trp Ser Arg Val Phe Pro Trp Pro Thr Thr Phe Val Asp Val Clu Tyr Clu 100 

301 ATT CAT GAG TCT TAC CGG TTC GTA AAC GAT CTC AAC ATT TCT AAA CAC GCA TTA CAA AAA 360 

101 He Asp Clu Ser Tyr Gly Leu Val Lys Asp Val Lys He Ser Lys Asp Ale Leu Glu Lys 120 

3«1 CTT GAT CAA ATC GCT AAC CAA AGG CAA ATA ATA TAT TAT AGG AAC CTA ATA AAT TCC CTA 420 

121 Leu Asp Glu He Ale Asn Cln Arg Glu He He Tyr Tyr Arg Asn Leu He Asn Ser Leu 140 

421 ACA AAG ACG CCT TTT AAC CTA ATA CTA AAC CTA AAT CAT TTT ACC CTC CCA ATA TCC CTT 480 

141 Arg Lys Arg Cly Phe Lys Val He Leu Asn Leu Asn His Phe Thr Leu Pro He Trp Leu 160 

481 CAT CAT CCT ATC CAA TCT AGA CAA AAA CCC CTC ACC AAT AAG ACA AAC CGA TCG GTA ACC 540 

161 His Asp Pro He Clu Ser Arg Glu Lys Ala Leu Thr Asn Lys Arg Asn Cly Trp Val Ser 180 

541 GAA AGG ACT CTT ATA GAG TTT GCA AAA TTT CCC CCC TAT TTA GCA TAT AAA TTC GCA GAC 600 

181 Glu Arg Ser Val He Clu Phe Ala Lys Phe Ala Ala Tyr Leu Ala Tyr Lys Phe Gly Asp 200 

601 ATA CTA GAC ATG TCG ACC ACA TTT AAT CAA CCT ATC CTC CTC CCC GAC TTC COG TAT TTA 660 

201 He Ve.1 Asp Ket Trp Ser Thr Phe Asn Glu Pro Met Val Va.1 Ala Glu Leu Gly Tyr Leu 220 

661 GCC CCA TAC TCA CCA TTC CCC CCC GGA CTC ATC AAT CCA GAA GCA GCA AAG TTA CTT ATG 720 

221 Ala Pro Tyr Ser Cly Phe Pro Pro Cly Val Met Asn Pro Clu Ala Ala Lys Leu Val Met 240 

721 CTA CAT ATG ATA AAC CCC CAT CCT TTA CCA TAT AGG ATC ATA AAG AAA TTT CAC AGA AAA 780 . 

241 Leu Him Met He Asn Ala His Ala Leu Ala Tyr Arg Met He Lys Lys Phe Asp Arg Lys 260 

781 AAA CCT GAT CCA GAA TCA AAA GAA CCA CCT GAA ATA GCA ATT ATA TAC AAT AAC ATC CCC 840 

251 Lys Ala Asp Pro Glu Ser Lys Glu Pro Ala Clu He Cly He He Tyr Asn Asn He Gly 280 

841 CTC ACA TAT CCG TTT AAT CCG AAA CAC TCA AAC CAT CTA CAA CCA TCC GAT AAT GCC AAT 900 

281 Val Thr Tyr Pro Phe Asn Pro Lys Asp Ser Lys Asp Leu Cln Ala Ser Asp Asn Ala Asn 300 

501 TTC TTC CAC ACT CCC CTA TTC TTA ACC CCT ATC CAC ACC GGA AAA TTA AAT ATC GAA TTT 960 

301 Phe Phe His Ser Gly Leu Phe Leu Thr Ala He His Arg Cly Lys Leu Asn He Glu Phe 320 

961 GAC GGA GAG ACA TTT CTT TAC CTT CCA TAT TTA AAC GCC AAT GAT TGC CTC GGA GTG AAT 1020 

321 Asp Gly Glu Thr Phe Val Tyr Leu Pro Tyr Leu Lys Cly Asn Asp Trp Leu Cly Val Asn 340 

1021 TAT TAT ACA ACA GAA CTC CTT AAA TAC CAA GAT CCC ATC TTT CCA ACT ATC CCT CTC ATA 10B0 

341 Tyr Tyr Thr Arg Clu Val Val Lys Tyr Cln Asp Pro Het Phe Pro Ser He Pro Leu He 360 

1081 ACC TTC AAG GGC CTT CCA CAT TAT CCA TAC CCA TCT ACA CCA CGA ACC ACC TCA AAG CAC 1140 

361 Ser Phe Lys Gly Val Pro Asp Tyr Cly Tyr Cly Cys Arg Pro Cly Thr Thr Ser Lys Asp 3B0 

1141 GGT AAT CCT CTT ACT GAC ATT GGA TCG GAG CTA TAT CCC AAA GGC ATG TAC GAC TCT ATA 1200 

381 Gly Asn Pro Val Ser Asp He Cly Trp Glu Val Tyr Pro Lys Gly Met Tyr Asp Ser He 400 

1201 GTA CCT CCC AAT CAA TAT CGA CTT CCT CTA TAC CTA ACA CAA AAC GGA ATA CCA CAT TCA 1260 

401 Val Ala Ala Asn Clu Tyr Cly Val Pro Val Tyr Vol Thr Glu Asn Cly He Ala Asp Ser 420 

1261 AAA CAT CTA TTA ACC CCC TAT TAC ATC CCA TCT CAC ATT GAA CCC ATC CAA CAC CCT TAC 1320 

421 Lys Asp Val Leu Arg Pro Tyr Tyr He Ala Ser Hia He Clu Ala Met Clu Glu Ala Tyr 440 
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t m KhA AAT CGT TAT CAC CTG ACA CCA TAC TTA CAC TCC CCA TTA ACC OAT AAT TAC CAA TOT. I I (10 

441 Clu Asn Cly Tyr Asp Val Arg Gly Tyr Leu Nit Trp Ale L«u Thr Asp Asn Tyr Clu Trp 460 

1581 CCC TTA CCC TTC ACA ATG ACC TTT CCC TTC TAC C AA CT A AAC TTC ATA ACC AAA CAC ACA U40 

4 61 Ala Leu Cly Phe Arc Met Ar« Phe Cly Leu Tyr Clu V*l Asn Leu Jlc Thr Lys Clu Arq 460 

1441 AAA CCC AGO AAA AAC ACT CTA ACA CTA TTC ACA CAC ATA CTT ATT AAT AAT CCC CTA ACA 1500 

461 Lyt pro Arg Lys Lys £er V«l Arg Vel Pht Arg Clu lie Vel lie Asn Asn Cly Leu Thr 500 

1501 AGC AAC ATC ACC AAA GAG ATC TTA CAC CAC CGC TAC 1536 

501 Ser Asn lift Arg Lys clu lie Leu Clu Clu Cly End 512 
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PTO0COCCU4 yVRlOJTJ* OLTCOSIDAJK - 7(31 
CCWLXtX 08MB 3KQUZHCX - 10/95 



1 
1 


kTG 
Met 


TTC 
Phe 


CCT 
Pro 


GAA 
Glu 


AAG 

Lys 


TTC 
Phe 


CTT 

Leu 


Trp 


GGT 
Gly 


GTG 
val 


Ala 


CAA TCG 
Gin Ser 


GGT 
Gly 


TTT CAG 
Phe CLn 


TTT 
Phe 


GAA 
Glu 


ATC. 
Met 


GGG 
Ciy 


60 

;o 


61 
21 


GAT 
Ajp 


AAA 


CTC 
Leu 


AGG 
Arg 


AGG 
Arg 


AAA 

Asn 


He 


fir 
UAL 

Asp 


ACT 
Thr 


AAC 
Asn 


ACT 

Thr 


uAT TGG 

Asp Trp 


TGG 
Trp 


CAC TGC 
His Trp 


GTA 
Val 


AGG 
Arg 


GAT 
Asp 


AAG 
Lya 


120 
40 


121 
41 


ACA 
Thr 


AAT 
Asn 


ATA 
11* 


GAG 
Glu 


AAA 
Lys 


Gly 


CTC 

Leu 


GTT 
Val 


AGT 
Ser 


GGA 
Gly 


GAT 
Asp 


CTT CCC 
Leu Pro 


GAG 
Clu 


GAG GGG 
Glu Gly 


ATT 
lie 


AAC 
Asn 


AAT 
Asn 


TAC 

Tyr 


180 
60 


61 


GAG 
Glu 


Leu 


TIT 

Tyr 


Uw 

Clu 


nnu 


GAC 
Asp 


CAT 
Hi 3 


GAG 
Glu 


ATT 
lie 


GCA 
Ala 


AGA 
Arg 


AAG CTG 
Lys Leu 


GGT 
Gly 


CTT AAT 
Leu Asn 


CCT 
Ala 


TAC 
Tyr 


AGA 
Arg 


ATA 
lie 


240 
60 


tin 
241 

81 


GGC 
Ciy 


ATA 
lit 


GAG 
Glu 


TGG 

Trp 


AGC 
Ser 


ACA 
Arg 


ATA 
lie 


TTC 
Phe 


CCA 
Pro 


TGG 
Trp 


CCA 
Pro 


ACG ACA 

Thr Thr 


TTT 
Phe 


ATT GAT 
He Aap 


CTT 
Val 


GAT 
Asp 


TAT 
Tyr 


AGC 
Ser 


300 
100 


301 
101 


TAT 
Tyi 


AAT 
Asn 


GAA 

Glu 


TCA 
Ser 


TAT 
Tyr 


AAC 
Asn 


CTT 
Leu 


ATA 
lie 


GAA 
Glu 


GAT 
Asp 


GTA 
Val 


AAG ATC 
Lys lie 


ACC 
Thr 


AAG GAC 
Lys Asp 


ACT 
Thr 


TTG 
Leu 


GAG 
Glu 


GAG 
Glu 


360 
120 


361 
121 


TTA 
Leu 


GAT 
Asp 


GAG 
Glu 


ATC 
lie 


GCC 
Ala 


AAC 
Asn 


AAG 


AGG 

Are 


GAG 
Glu 


GTG 
Val 


GCC 
Ala 


TAC TAT 
Tyr Tyr 


AGG 
Arg 


TCA GTC 
Ser Val 


ATA 
He 


AAC 
Asn 


AGC 
Ser 


CTC 
Leu 


420 

140 


421 
HI 


AGC 
Arg 


AGC 


AAG 
-V 5 


GGG 

Gly 


TTT 

?>.e 


AAG 


GTT 

VAl 


ATA 
He 


GTT 

Val 


AAT 

rJZT> 


CTA 

Leu 


AAT CAC 
Asn Hi. s 


TTC 
Phe 


ACC CTT 
Thr Leu 


CCA 
Pro 


TAT 
Tyr 


TGG 
Tip 


TTG 
Leu 


480 

160 


48: 

161 


CAT 
Kia 


GAT 
Asp 


CCC 

Pro 


ATT 
lie 


GAG 
Glu 


GCT 
Ala 


AGG 
Ary 


GAG 
Glu 


AGG 

Arg 


GCG 
Ala 


TTA 
Leu 


ACT AAT 
Thr Asn 


AAG 

Ly3 


AGG AAC 
Arg Asn 


GGC 
Gly 


TGG 
Trp 


CTT 

Val 


AAC 
Asn 


S40 
160 


541 

191 


CCA 
Pro 


ACA 
Arc 


ACA 
Thr 


GTT 
Val 


ATA 
21* 


GAG 

Glu 


TTT 
Phe 


CCA 
Ala 


AAG 

Lys 


TAT 
Tyx 


GCC 
Ala 


GCT TAC 
Ala Tyr 


ATA 
lie 


GCC TAT 
Ala Tyr 


AAG 
Lys 


TTT 
Phe 


GGA 
Gly 


GAT 
A*P 


600 
200 


6C1 
2 01 


ATA 
lie 


GTG 

VAl 


GAT 
A5p 


ATG 
Ket 


TGG 
Trp 


AGC 
Ser 


ACG 
Thr 


TTT 
Phe 


AAT 
Asn 


GAG 
Glu 


CCT 
Pro 


ATG GTG 
Met Val 


GTT 
Val 


GTT GAC 
Val Glu 


CTT 
Leu 


GGC 
Gly 


TAC 
Tyr 


CTA 
Leu 


660 
220 


661 

221 


GCC 
Ala 


ccc 

Pro 


TAC 
Tyr 


TCT 
Ser 


GGC 

Gly 


TTC 
Phe 


CCT 
Pro 


CCA 
Pro 


GGG 
Gly 


GTT 
Val 


CTA 
Leu 


AAT CCA 
Asn Pro 


GAG 
Clu 


GCC GCA 
Ala Ala 


AAG 


CTC 
Leu 


CCC 
Ala 


ATA 
He 


720 
240 


721 
241 


CTT 
Leu 


CAC 
His 


ATG 
Mac 


ATA 
lie 


AAT 
Asn 


GCA 
Al» 


CAT 
His 


CCT 
Ala 


TTA 
Leu 


GCT 
Ala 


TAT 

Tyr 


AGG CAG 
Arg Gin 


ATA 
lie 


AAG AAG 
Lya Lys 


TTT 
Phe 


GAC 
Asp 


ACT 
Thx 


GAG 
Clu 


780 

260 


701 
2 61 


AAA 

Lys 


GCT 
Ala 


GAT 
Asp 


AAG 
Lys 


GAT 


TCT 
Ser 


AAA 
Lys 


GAG 
Glu 


CCT 
Pro 


GCA 
Ala 


GAA 

Glu 


GTT GGT 
val Gly 


ATA 
lie 


ATT TAC 
He Tyr 


AAC 
Asn 


AAC 
Asn 


ATT 
lie 


CGA 
Cly 


840 

280 


041 
261 


GTT 

Val 


GCT 
Ala 


TAT 

Tyr 


CCC 
Pro 


AAG 
Lys 


GAT 
AsD 


CCG 
Pro 


AAC 
Asn 


GAT 
Asp 


TCC 
Ser 


AAG 
Lys 


GAT CTT 
Asp Val 


AAG 
Lys 


GCA GCA 
Ala Ala 


GAA 
Glu 


AAC 
Asn 


GAC 
Asp 


AAC 
Asn 


900 
300 


901 
301 


TTC 
Phe 


TTC 
Phe 


CAC 
His 


TCA 
Ser 


GGG 
Gly 


CTG 
Leu 


TTC 
Phe 


TTC 
Phe 


GAC 

Glu 


GCC 
Ala 


ATA 
lie 


CAC AAA 
His Lys 


GGA 
Gly 


AAA CTT 
Lys Leu 


AAT 
Asn 


ATA 
lie 


GAC 
Glu 


TTT 
Phe 


960 
320 


961 
321 


GAC 
Asp 


GGT 

Gly 


GAA 

Glu 


ACG 
Thr 


TTT 
Phe 


ATA 
lie 


GAT 
Asp 


GCC 
Ala 


CCC 
Pro 


TAT 
Tyr 


CTA 
Leu 


AAG GGC 
Lys Gly 


AAT 
Asn 


GAC TGG 
Asp Trp 


ATA 
He 


GGG 
Gly 


GTT 
Val 


AAT 
Asn 


1023 
340 


1021 
341 


TAC 
Tyr 


TAC 
Tyr 


ACA 

Thr 


AGG 
Arg 


GAA 

Glu 


GTA 
val 


GTT 
Val 


ACG 
Thr 


TAT 
Tyr 


CAG 
Gin 


GAA 
Glu 


CCA ATG 
Pro Met 


TTT 
Phe 


CCT TCA 
Pro ser 


ATC 
He 


CCC 
Pro 


CTG 
Leu 


ATC 
He 


1080 
360 


1081 
361 


ACC 
Thr 


TTT 
Phe 


AAG 

Lys 


GGA 
Gly 


GTT 
Val 


CAA 
Gin 


GGA 
Gly 


TAT 
Tyr 


GGC 
Gly 


TAT 
Tyr 


GCC 
Ala 


TGC AGA 
Cya Arc; 


CCT 
Pro 


GGA ACT 
Gly Thx 


CTG 
Leu 


TCA 
Ser 


AAC 
Lya 


GAT 
Asp 


1140 

3fl0 


1141 
391 


GAC 
Asp 


AGA 
Arg 


CCC 
Pro 


CTC 
Val 


AGC 
Ser 


GAC 
Asp 


ATA 
1U 


GGA 
Gly 


TGG 
Trp 


GAA 
GLu 


CTC 
Leu 


TAT CCA 
Tyr Pro 


GAG 

Glu 


GGC ATG 
Gly Met 


TAC 
Tyr 


GAT 
Asp 


TCA 
ser 


ATA 
He 


L20C 
400 


1201 
401 


GTT 
Val 


GAA 
OLu 


GCT 
Ala 


CAC 

HlS 


AAG 
Lya 


TAC 

Tyr 


GGC 
Gly 


GTT 
val 


CCA 
Pro 


GTT 
val 


TAC 
Tyr 


GTG ACC 
Val Thr 


GAG 

Clu 


AAC GGA 
Asn Gly 


ATA 
He 


GCG 
Ala 


GAT 
A.ip 


TCA 
Ser 


1260 
420 
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1261 AAG GAC ATC CTA AGA CCT TAG TAC ATA CCC AGC CAC ATA AAG ATG ATA GAG AAC CCC TTT 1320 

421 Lya A*p lit Leu Arg Pro Tyr Tyr lit AW Ser Hia lie Lya Met lie Clu Ly» AU Phe 44Q 

1321 GAG GAT CGC TAT GAA GTT AAC GGC TAC TTC CAC TCC CCA TTA ACT GAC AAC TTC GAG TGG 1380 

441 Glu Aap Gly Tyr Glu Vtl Lya Gly Tyr Phe His Trp Ala Uu Thr Aap Aan Phe Glu Trp 460 

1381 OCT CTC CCG TTT AGA ATC CCC TTT GGC CTC TAC GAA GTC AAC CTA ATT ACA AAG GAG AGA 1440 

461 Ala Leu Gly Phe Arg Met Axg Phe Gly Leu Tyr Glu V*l Aan Leu He Thr Lya Glu Arg 4B0 

1441 ATT CCC AGG GAG AAG AGC GTG TCG ATA TTC AGA GAG ATA GTA GCC AAT AAT GGT GTT ACG 1500 

48 J lie Pro Arg Glu Lya Ser Val S«r Tie Phe Arg Glu He V*l Al* Aan A*n Gly v*i Thr 500 

1501 AAA AAG AIT GAA GAG GAA TTG CTC AGG GGA TGA 1533 

501 Lya Lya He Glu Glu Glu Leu Leu Arg Gly End 511 



Figure Bb(Continued) 
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ftankia gouldi eoftoglttcaaaea (370P1) 

9 IB 27 36 45 54 

5* ATG ACA ATA CGT T7A GCC ACG CTC GCQ CTC TGC GCA GCG CTG AGC CCA GTC ACC 
Mot Arg lie Arg Leu Ala Thr Uu Ala Leu Cya Ala Ala Lou Ser Pro Val Thr 

63 72 81 90 99 iOB 

TTT CCA AAT GTA ACC GTA CAA ATC GAC GCC GAC GCC GGT AAA AAA CTC ATC 
Phe Ala Asp Asn Val Thr Val Gin lie Asp Ala Aap Gly Cly Lyo Lya Lw Ila 

U7 126 135 144 153 162 

ACC CGA GCC CTT TAC GCC ATC AAT AAC TCC AAC CCA CAA ACC CTT ACC GAT ACT 
Ser Arg Ala Uu Tyr Gly Met Asn Aan Ser Asn Ala Glu Ser Leu Thr Asp Thr 

171 1B0 189 19B 207 21S 

GAC TGG CAG CGT TTT CGC GAT GCA CGT GTG CGC ATG CTG CGG GAA AAT GGC CGC 
Asp Trp Gin Arg Phe Arg Aap Ala Gly Val Arg Hat Uu Arg Glu Asn Gly Gly 

225 234 243 252 2 61 27 0 

AAC AAC AGC ACC AAA TAT AAC TGG CAA CTG CAC CTG AGC AGT CAT CCG GAT TGG 
Aan Aan Ser. Thr Lys Tyr Asa Trp Gin Leu His Lru Ser Ser His Pro Asp Trp 

279 288 297 306 315 324 

TAC AAC AAT GTC TAC GCC CGC AAC AAC AAC TGG GAC AAC CGG GTA GCC CTG ATT 
Tyr Asn Asa Val Tyr Ala Gly Aan Aan Aan Trp Asp Aan Arg Val Ala Leu lis 

333 342 351 360 369 37B 

CAG GAA AAC CTG CCC GGC GCC GAC ACC ATG TGG GCA TTC CAG CTC ATC GGT AAG 
Gin Glu Aan Leu Pro Gly Ala Aap Thr Met Trp Ala Phe Gin Leu lie Gly Lye 

387 %H 405 414 423 432 

GTC GCQ GCQ ACT TCT GCC TAC AAC TTT AAC GAT TGG GAA TTC AAC CAG TCG CAA 

Val Ala Ala Thr Ser Ala Tyr Asn ?h* Asn Asp Trp Glu Phe Asn Gin Ser Gin 

441 450 4S9 468 477 406 

TGG TGG ACC GGC GTC GCT CAG AAT CTC GCT GGC GGC GOT GAA CCC AAT CTG GAC 

Trp Trp Tar Gly Val Ala Gin Aan Leu Ala Gly Gly Gly Glu Pro Aan Leu Aap 

495 504 513 522 531 540 

GGC GGC GGC GAA GCG CTG GTT GAA GGA GAC CCC AAT CTC TAC CTC ATG GAT TGG 
Gly Gly Gly Glu Ala Uu Val Clu Cly A*p Pro Asa Leu Tyr Uu Met Aap Trp 

549 556 567 576 585 594 

TCG CCA GCC GAC ACT GTO GGT ATT CTC GAC CAC TGG TTT GGC GTA AAC QGG CTC 
Ser Pro Ala Asp Thr Val Gly lie Leu Asp His Trp Phe Gly Val Aan Gly Leu 

603 612 621 630 639 648 

GGC GTO CGG CGT GGC AAA GCC AAA TAC TGG AGT ATG GAT AAC GAG CCC GGC ATC 
Gly val Arg Arg Gly Lys Ala Lye Tyr Trp Ser Met Asp Asn Glu Pro Gly lie 

657 666 675 684 693 702 

TGG CTT GGC ACC CAC GAC GAT G7TA GTG AAA GAA CAA ACG CCG GTA GAA GAT TTC 
Trp Val Oly Thr His A.p A*p Val Val Lys Clu Gin Thr Pro Val Glu A«p Phe 

Figure 9<x 
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Baakia gouldi endoglucanase (37 OP I) (continued) 





711 






720 






729 






73B 






747 




756 


CTG 


CAC ACC 


TAT 


TTC 


GAA 


ACC 


GCC 


AAA 


AAA 


GCC 


CGC 


GCC 


AAA 


TTT 


CCC 


GGT ATT 


Leu 


His Thr 


Tyr 


She 


Glu 


Thr 


Ala 


Lys 


Lys 


Ala 


Arg 


Ala 


Lys 


Phg 


Pro 


Cly He 




765 






774 






783 






792 






801 




810 


AAA 


ATC ACC 


GCT 


CCG 


CTG 


CCC 


GCT 


AAT 


GAG 


TGG 


CAC 


TGG 


TAT 


CCC 


TGG 


GGC GGT 


Ly» 


Ha Thr 


Gly 


Pro 


Val 


Pro 


Ala 


Asn 


Glu 


Trp 


Gin 


Trp 


Tyr 


Ala 




Gly Gly 




819 






828 






837 






846 






855 




864 


TTC 


TCG GTA 


ccc 


CAG 


GAA 


CAA 


GGG 


TTT 


ATG 


AGC 


TGG 


ATG 


GAG 


TAT 


TTC 


ATC AAG 


Pha 


Ser VaI 


Pro 


Gin 


GlU 


Gin 


Cly 


Phe 


Met 


Ser 


Trp 


Met 


Glu 


Tyr 


Pha 


Ha Lys 




873 






882 






891 






900 






909 




918 


CGG 


GTG TCP 


GAA 


GAG 


CAA 


CGC 


CCA 


ACT 


GCT 


GTT 


CGC 


CTC 


CTC 


GAT 


GTA 


CTC GAT 


Arg 


Val Scr 


Glu 


Glu 


Gin 


Arg 


Ala 


Set 


Gly 


Val 


Arg 


Leu 


Leu 


Asp 


V*l 


Leu Asp 




927 






936 






945 






954 






963 




972 


CTG 


CAC TAC 


TAC 


CCC 


GGC 


GCT 


TAC 


AAT 


GCG 


GAA 


GAT 


ATC 


GTG 


CAA 


TTA 


CAT CGC 


Lau 


Hi* Tyr 


Tyr 


Pro 


Gly 


AU 


Tyr 


Asn 


Ala 


GlU 


Asp 


He 


val 


Gin 




His Arg 




981 






990 






999 




1008 




1017 




1026 


ACQ 


TTC TTC 


GAC 


CGC 


GAC 


TTT 


GTT 


TCA 


CTG 


GAT 


CCC 


AAC 


GGG GTG 


AAA 


ATG GTA 


Thr 


Phe Phe 


Asp 


Arg 


Asp 


Phe 


Val 


s«r 


Leu 


Asp Ala 


Asn 


Gly Val 




net Val 



1035 1044 1053 1052 1071 1080 

GAA GGT GGC TGG GAT GAC AGC ATC AAC AAG GAA TAT ATT TTC GGG CGA GTG AAC 
Glu Gly Gly Trp Asp Asp Ser Ila Asn tys Glu Tyr Ha Phe Gly Arg Val Asn 



1089 1098 1107 1116 1125 1134 

GAT TGG CTC GAG GAA TAT ATG GGG CCA GAC CAT GGT GTA ACC CTG GGC TTA ACC 
Aap Trp Leu Glu Glu Tyr Hat Gly Pro Asp aia Gly Val Thr Leu Gly Leu Thr 

1143 1152 1161 U70 1179 1188 

GAA ATG TGC GTG CGC AAT GTG AAT CCG ATG ACT ACC GCC ATC TGG TAT GCC TCC 
Glu Mat Cy* Val Arg Asn Val Asn Pro Hat Thr Thr Ala 11a Trp Tyr Ala Sar 

1197 1206 1215 1224 1233 1242 

ATG CTC GGC ACC TTC GCG GAT AAC GGC GTC CAA ATA TTC ACC CCA TGG TGC TGG 
Mat Law Gly Thr Pha Ala Aap Asn Gly Val Glu He Phe Thr Pro Trp Cys Trp 

1251 1260 1269 1278 1287 1296 

AAC ACC GGA ATG TGG GAA ACA CTC CAC CTC TTC AGC CGC TAC AAC AAA CCT TAT 
Asn Thr Gly Met Trp Glu Thr Leu His Leu Phe Ser Arg Tyr Asn Lys Pro Tyr 

1305 1314 1323 1332 1341 1350 

CGG GTC GCC TCC AGC TCC ACT CTT GAA GAG TTT GTC AGC GCC TAC AGC TCC ATT 
Arg Val Ala Ser Ser Sar Ser Leu Glu Glu Phe Val Ser Ala Tyr Ser Ser He 

1359 1368 1377 1386 1395 1404 

AAC GAA GCA GAA GAC GCC ATG ACG GTA CTT CTG GTG AAT CGT TCC ACT AGC GAG 
Asn Glu Ala Glu Asp Ala Met Thr Val Leu Leu Val Asn Arg Sar Thr Ser Glu 

Figure 9b (Continued) 
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CunvlwLv Gene 5<W|u<w:ft ^ ^ ^ ^ 

9 Ifl 27 36 4!) 54 

GTG ATC TCT CTC GAA ATA VTQ GGA AAC ACC TTC AGA GAG OCA AGA TIC CTT CTC 

Val He eye Val ciu Tic Phc Gly Lyn Thr Fh« Arg Glu Gly Arc; Plw Val Leu 

63 72 81 90 99 i08 

AAA GAG AAA AAC TTC ACA CTT GAG TIC GCG GTG GAG AAG ATA CAC CTT CQC TOC 

Lya Glu Lya Axn Phe Thr Val Glu Phe Ala Val Glu Lys Ila His Leu Gly Trp 

117 12(5 135 144 153 162 

AAG ATC TCC GGC AGO GTG AAG CCA ACT COG GCA AGG CTT GAG GTT CTT OGA ACG 

Lys He Ser Gly Ara Val Lys Gly Ser Pro Gly Arg Leu Glu Val Leu Arg Thr 

I'l 180 1B9 198 207 216 

AAA GCA CCG GAA AAG. GTA CTT GTG AAC AAC TGG CAG TCC TGG GGA COG TGC AGG 

Lys Ala Pro Glu Lys Val Leu Val Asn Asn Trp Gin Ser Trp Gly Pro Cya Arg 

225 234 243 253 261 270 

GTG CTC GAT CCC TTT TCT TTC AAA CCA CCT GAA ATA GAT CCG AAC TGG AGA TAC 

Val Val Asp Ala Phe Ser Phe Lys Pro Pro Glu He Asp Pro Am Trp Axy Tyr 

279 288 297 306 315 324 

ACC GCT TOG GTG GTG CCC GAT GTA CTT GAA AGG AAC CTC CAQ AGO GAC TAT TIC 

Thr Ala Ser Val Val Pro Asp Val Leu Glu Ary Asa Leu Gia Ser Asp Tyr Phe 

333 342 351 360 369 * 378 

GTG GCT GAA GAA GGA AAA GTG TAC OCT TTT CTG ACT TOG AAA ATC GCA CAT CCT 

Val Ala Glu Glu Gly Lys Val Tyr Gly Phe Leu Ser Ser Lys He Ala His Pro 

387 396 405 414 423 432 

TTC TTC GCT GTG GAA GAT GCG GAA CTT GTG GCA TAC CTC GAA TAT TTC GAT GTC 

Phe Phe Ala Val Glu Asp Gly Glu Leu Val Ala Tyr Leu Glu Tyr Phe Asp Val 

441 450 459 468 477 486 

GAG TTC GAC GAC TTT GTT CCT CTT GAA CCT CTC GTT GTA CTC GAG GAT CCC AAC 

Glu Phe Anp Aap Phe Val Pro Leu Glu Pro Leu Val Val Leu Glu Asp Pro Asn 

49b 504 513 522 531 540 

ACA CCC CfT CTT CTG GAG AAA TAC GCG GAA CTC CTC GGA ATG GAA AAC AAC CCC 

Thr Pro Leu l-eu Leu Glu Lys Tyr Ala Glu Leu Vol Gly Met Glu Asn Asn Ala 

b4<) 558 567 576 Sfl5 594 

AGA GTT CCA AAA CAC ACA CCC ACT GGA TGG TGC AGC TGC TAC CAT TAC TTC CTT 

Arg Val Pro Lye Hie 'llu: i»io 'Hit Gly Trp eye Ser Trp Tyr His Vyz Hhe Leu 

Figure 10&- 
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tlwtteototja jnariLima Alpha -<*al act os idaae 

«03 612 621 610 639 64B 

GAT CTC ACC TGG CAA GAG A£X CTC AAG AAC CTC AAG CTC OCO AAC AAT TTC CCC 

Asp Leu "Hit Trp Clu Glu Thr Leu Lyn Asn L^u Lys t*u Ala Lys Aon Phe Pro 

*57 666 675 684 693 702 

TTC GAG CTC TIC CAG ATA CAC GAC CCC TAC GAA AAC CAC ATA GGT GAC TGG CTC 

Ph* Glu Val Phe Gin He Asp Asp Ala Tyr Glu Lys Asp lie Gly Asp Trp L*u 

711 720 729 738 747 756 

CIXJ ACA AGA GGA GAC TIT CCA ICG GTG GAA GAG ATG GCA AAA GTT ATA OQG GAA 

Val Thr Arg Gly Asp Phe Pro Ser Val Glu Glu Met Ala Lys Val He Ala Glu 

7G5 774 783 792 801 810 

AAC GOT TIC ATC CCG GQC ATA TGG ACC GCC OCG TIC ACT GTT TCT GAA ACC TOG 

Asn Gly Phe lie Pro Gly He Trp Thr Ala Pro Pha Ser Val Sex Glu Thr Sex 

819 628 837 846 855 864 

GAT GTA TTC AAC GAA CAT CCQ CAC TGG GTA GTG AAG GAA AAC GGA GAG CCG AAG 

Asp Val Phe Asn Glu His Pro Asp Trp Val Val Lys Glu Asa Gly Glu Pro Lys 

873 882 891 900 309 918 

ATG GCT TAC ACA AAC TGG AAC AAA AAG ATA TAC G0C CTC GAT CTT TCG AAA GAT 



Met Ala Tyr Arg Ami Trp Axn Lys Lye He Tyr Ala Leu Asp Leu Scz Lys Asp 

927 936 545 954 $63 ' 972 

CAG GTT CTC AAC TGG CTT TTC CAT CTC TTC TCA TCT CTG AGA AAG ATS OGC TAC 



Glu Val Leu Asn Trp Leu Phe Asp Leu Phe Ser Ser Leu Ary Lys Met Gly Tyr 

981 990 999 1008 1017 1026 

AGG TAC TTC AAG ATC GAC TTT CTC TTC GCG GGT GOC GTT CCA GGA GAA AGA AAA 

Arg Tyr Phe Lys He Asp Phe Leu Phe Ala Gly Ala Val Pro Gly Glu Arg Lys 

103S 1044 1053 1062 1071 1080 

AAG AAC ATA ACA CCA ATT CAG CCG TTC AGA AAA GGG ATT GAG AGG ATC AGA AAA 

Lya Asn He Thr Pro He Gin Ala Phe Arg Lya Gly He Glu Thr He Arg Lya 

, 1089 1098 1107 1116 1125 1134 

GCG GTG GGA GAA GAT TCT TTC ATC CTC GGA TC3C GGC TCT CCC CTT CTT CCC CCA 

Ala Val Gly Glu Asp Ser Ptic He Leu Gly Cys Gly Sex Pro Leu Leu Pro Ala 

1143 HM 1161 1170 1179 U88 

CTC OCA TGC OTC GAC COG ATC AGO ATA GGA OCT CAC ACT GCG CCG TTC TGG GGA 

Val Gly Cys Vnl Asp Gly Mftt Arg Ho Gly Pro Asp Tlu Ala Pro Phe Trp Gly 

Figure 10fc( Continued) 
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1137 1206 1215 1224 12.U 1242 

CAA CAT ATA GAA CAC AAC GGA OCT CCC OCT GCA AGA TCG CCC CTG AGA AAC OCC 

Glu His lie Glu Asp Asn Cly Ala Pro Ala Ala Arg Trp Ma Leu Arg Asa Ala 

12S1 1260 1269 1278 1287 129$ 

ATA AOG AGG TAC TTC ATC CAC GAC AGG TTC TGC CTG AAC GAC COC GAC TOT CTG 

He Hit Arg Tyr Phc Mac His Asp Arg Phe Trp Lea Asa Asp Pro Asp Cys Leu 

1305 L314 1323 1332 1341 1350 

AXA CTC AGA GAG GAGAAAACGGATCTCACACAGAAGGAA AAC SAG CTC TAC TOG 



lie Leu Arg Glu Glu Lys Tftr hsp Levi Thr Gin Lys Glu Lys Glu Leu Tyr Sex 

1359 1368 13T7 1386 1395 . 1404 

TAC ACG TOT GGA GIG CTC GAC AAC ATC ATC ATA GAA AGC GAT GAT CTC TOG CIC 



Tyr Tbr Cys Cly Val Uu Asp Asn Met lie lie Glu Ser Aep Asp Leu Ser Leu 

1413 1422 1431 1440 1443 1458 

GTC AGA GAT CAT GCA AAA AAG GTT CTC AAA GAA ACG OCC GAA CTC CTC GOT GGA 

val Arg Asp Hi* Gly Lys Lys V«d Leu Lys Glu Tbr Leu Glu Leu Leu Gly Gly 

1467 1476 148S 1494 1503 1512 

AGA CCA CGG GTT CAA AAC ATC ATS TCG GAG GAT CTG AGA ©C GAG ATC GTC TOG 

Arg Pro Arg Val Gin Aan He Met Ser Glu Asp Leu Arg Tyr Glu He Val Ser 

1521 „ 1530 1539 1548 1557 1566 

TCT GGC ACT CTC TCA CCA AAC GTC AAC ATC GTC GTC GAT CTG AAC AGC AGA GAG 

Ser Gly Thr Leu Ser Gly Asn Val Lys He Val Val C Glu 

1575 1584 1593 1602 1611 1620 

T31C CAC CTG GAA AAA GAA GGA AAG TCC TCC CTC AAA AAA AGA GTC GTC AAA AGA 

Tyr Hie Leu Glu Lys Glu Gly Lys Sex Ser Leu Lys Lys Arg Val Val Lys Arg 

1629 1638 1647 1656 1665 

GAA GAC GGA AGA AAC TIC TAC TTC TAC GAA CAC OCT GAG AGA GAA TCA 3 * 

Glu Asp Gly Arg Asn Phc Tyr Phe Tyr Glu Glu Gly Glu Arg Glu *** 



Figure 10c (Continued) 
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Tharmotoga aaritiaa 0-»anna»a»e (6dMfO^ 

9 18 27 36 45 54 

5* ATG GGG ATT GGT GGC GAC GAC TCC TGG AGC CCG TCA CTA TCG GCG GAA TTC CTT 

Mec Gly He Gly Gly Asp Asp Ser Trp Ser Pro Ser Val Ser Ala Glu Phe Leu 

63 72 81 90 99 ios 

TTA TTG ATC GTT GAG CTC TCT TTC GTT CTC TTT GCA AGT GAC GAG TTC GTG AAA 

Leu Leu lie Val Glu Leu Ser Phe Val Leu Phe Ala Ser Asp Glu Phe Val Lys 

117 126 135 144 153 162 

GTG GAA AAC GGA AAA TTC GCT CTG AAC GGA AAA GAA TTC AGA TTC ATT GGA AGC 

Val Glu Asn Gly Lys Phe Ala Leu Asn Gly Lys Glu Phe Arg Phe He Gly Ser 

171 180 189 19B 207 216 

AAC AAC TAC TAC ATG CAC TAC AAG AGC AAC GGA ATG ATA GAC AGT GTT CTG GAG 

Asn Asn Tyr Tyr Met His Tyx Lys Ser Asn Gly Mec lie Asp Ser Val Leu Glu 

225 234 243 252 261 270 

AGT GCC AGA GAC ATG GGT ATA AAG GTC CTC AGA ATC TGG GGT TTC CTC GAC GGG 

Ser Ala Arg Asp Met Gly lie Lys Val Leu Arg lie t Trp Gly Phe Leu Asp Gly 

279 288 297 306 315 324 

GAG AGT TAC TGC AGA GAC AAG AAC ACC TAC ATG CAT CCT GAG CCC GGT GTT TTC 

Glu Ser Tyr Cys Arg Aap Lys Asn Thr Tyr Met His Pro Glu Pro Gly Val Phe 

333 342 351 360 369 378 

GGG GTG CCA GAA GGA ATA TCG AAC GCC CAG AGC GGT TTC GAA AGA CTC GAC TAC 

Gly Val Pro Glu Gly lie Ser Asn Ala Gin Ser Gly Phe Glu Arg Lau Asp Tyr 

387 396 405 414 423 432 

ACA GTT GCG AAA GCG AAA GAA CTC GGT ATA AAA CTT GTC ATT GTT CTT GTG AAC 

Thr Val Ala Lys Ala Lys Glu Leu Gly He Lys Leu Val He Val Leu Val Asn 

441 450 459 468 477 486 

AAC TGG GAC GAC TTC GGT GGA ATG AAC CAG TAC GTG AGG TGG TTT GGA GGA ACC 

Asn Trp Asp Asp Phe Gly Gly Met Asn Gin Tyr Val Arg Trp Phe Gly Gly Thr 

495 504 513 522 531 540 

CAT CAC GAC GAT TTC TAC AGA GAT GAG AAG ATC AAA GAA GAG TAC AAA AAG TAC 

His His Asp Asp Phe Tyr Arg Asp Glu Lya He Lys Glu Glu Tyr Lys Lys Tyr 

Figure lice 
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?h*r*oto9& aaritlAA fi-»axukuias« tftmW) (continued) 

549 558 567 576 585 594 

GTC TCC TIT CTC CTA AAC CAT GTC AAT ACC TAC ACG GGA GTT CCT TAC AGG GAA 

Val Scr Phe Leu Val Asn His Val Asn Thr Tyr Thr Gly Val Pro Tyr Arg Glu 

603 612 621 630 639 548 

GAG CCC ACC ATC ATG GCC TGG GAG CTT GCA AAC GAA CCG CCC TGT GAG ACG GAC 

Glu Pro Thr lie Met Ala Trp Glu Leu Ala Asn Glu Pro Arg Cys Glu Thr Asp 

657 666 675 684 693 702 

AAA TCG GGG AAC ACG CTC GTT GAG TGG GTG AAG GAG ATG AGC TCC TAC ATA AAG 

Lys Ser Gly Asa Thr Leu Val Glu Trp Val Lys Glu Met Ser Ser Tyr lie Lys 

711 720 729 738 747 756 

AGT CTG GAT CCC AAC GAC CTC GTG GCT GTG GGG GAC GAA GGA TTC TTC AGC AAC 

Ser Leu Asp Pro Asn His Leu Val Ala Val Gly Asp Glu Gly Phe Phe Ser Asn 

765 774 763 792 801 810 

TAC GAA GGA TO AAA CCT TAC GOT GGA GAA GCC GAG TGG GCC TAC AAC GGC TGG 

Tyr Glu Gly Phe Lye Pro Tyr Gly Gly Glu Ala Glu Trp Ala Tyr Ann Gly Trp 

819 828 837 846 855 864 

TCC GGT GTT GAC TGG AAG AAG CTC CTT TCG ATA GAG ACG GTG GAC TTC GGC ACG 

Ser Gly Val Asp Trp Lys Lys Leu Leu Ser He Glu Thr Val Asp Phe Gly Thr 

873 882 891 900 909 918 

TTC CAC CTC TAT CCG TCC CAC TGG GGT GTC AGT CCA GAG AAC TAT GCC CAG TGG 

Phe His Leu Tyr Pro Ser His Trp Gly Val Ser Pro Glu Asn Tyr Ala Gin Trp 

927 936 945 954 963 972 

GGA GCG AAG TGG ATA GAA GAC CAC ATA AAG ATC GCA AAA GAG ATC GSA AAA CCC 

Gly Ala Lys Trp lie Glu Asp His He Lys He Ala Lys Glu He Gly Lys Pro 

981 990 999 1008 1017 1026 

GTT GTT CTG GAA GAA TAT GGA ATT CCA AAG AGT GCG CCA GTT AAC AGA ACG GCC 

Val Val Leu Glu Glu Tyr Gly He Pro Lys Ser Ala Pro Val Asn Arg Thr Ala 

1035 1044 1053 1062 1071 1060 

ATC TAC AGA CTC TGG AAC GAT CTG CTC TAC GAT CTC GGT GGA GAT GGA GCG ATG 

He Tyr Arg Leu Trp Asn Asp Leu Val Tyr Asp Leu Gly Gly Asp Gly Ala Met 
Figure lib (Continued) 
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Thermo toga aaritiM f)-nannanaaa {tff&t! (continued) 



1089 1098 H07 11X6 1125 1134 

TTC TGG ATG CTC GCG GGA ATC GGG GAA GGT TCG GAC AGA GAC GAG AGA OGG TAC 

Phe Trp Met Leu Ala Gly lie Gly Glu Gly Ser Asp Arg Asp Glu Arg Gly Tyr 

1143 1152 1161 1170 1179 HB8 

TAT CCG GAC TAC GAC GGT TTC AGA ATA GTG AAC GAC GAC AGT CCA GAA GCG GAA 

Tyr Pro Asp Tyr Asp Gly Phe Arg lie Val Asn Asp Asp Ser Pro Glu Ala Glu 

1197 1206 1215 1224 1233 1242 

CTG ATA AGA GAA TAC GCG AAG CTG TTC AAC ACA GGT GAA GAC ATA AGA GAA GAC 

Leu He Arg Glu Tyr Ala Lys Leu Pha Asn Thr Gly Glu Asp He Arg Glu Asp 

1251 . 1260 1265 1278 1287 1296 

ACC TGC TCP TTC ATC CTT CCA AAA GAC GGC ATG GAG ATC AAA AAG ACC GTG GAA 

Thr Cya Ser Phe He Leu Pro Lya Asp Gly Met Glu He Lys Lys Thr Val Glu 

1305 1314- 1323 1332 1341 1350 

GTG AGO GCT GGT GTT TTC GAC TAC AGC AAC ACG TTT GAA AAG TTG TCT GTC AAA 

Val Arg Ala Gly Val Phe Asp Tyr Ser Asn Thr Phe Glu Lys Leu Ser Val Lys 

1359 1366 1377 1386 1395 1404 

GTC GAA GAT CTG GTT TTT GAA AAT GAG ATA GAG CAT CTC GGA TAC GGA ATT TAC 

Val Glu Asp Leu Val Phe Glu Asn Glu He Glu Bis Leu Gly Tyr Gly He Tyr 

1413 1422 1431 1440 1449 1458 

GGC TTT GAT CTC GAC ACA ACC CGG ATC CCG GAT GGA GAA CAT GAA ATG TTC CTT 

Gly Phe Asp Leu Asp Thr Thr Arg He Pro Asp Gly Glu His Glu Met Phe Leu 

1467 1476 1485 1494 1503 1512 

GAA GGC CAC TTT CAG GGA AAA ACG GTG AAA GAC TCT ATC AAA GCG AAA GTG GTG 

Glu Gly His Phe Gin Gly Lys Thr Val Lys Asp Ser He Lys Ala Lys Val Val 

1521 1530 1539 1548 1557 1566 

AAC GAA GCA CGG TAC GTG CTC GCA GAG GAA GTT GAT TTT TCC TCT CCA GAA GAG 

Asn Glu Ala Arg Tyr Val Leu Ala Glu Glu Val Asp Phe Ser Ser Pro Glu Glu 

1575 1584 1593 1602 1611 1620 

GTG AAA AAC TGG TGG AAC AGC GGA ACC TGG CAG CCA GAG TTC GGG TCA CCT GAC 

Val Lys Aen Trp Trp Asn Ser Gly Thr Trp Gin Ala Glu Phe Gly Ser Pro Asp 
Figure HO( Continued) 
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Theraotega staxltla* P*u&a*a*»* (continued) 




1629 1638 1647 1656 1665 1674 

ATT GAA TOO AAC GGT GAG GTG GGA AAT GGA GCA CTG CAG CTG AAC GTG AAA CTG 

lie Glu Trp Asn Gly Glu Val Gly Asn Gly Ala Uu Gin Leu Asn Val Lys Leu 

1683 1692 1701 1710 1719 1728 

CCC GGA AAG AGC GAC TGG GAA GAA GTG AGA GTA GCA AGG AAG TTC GAA AGA CTC 

Pro Gly Lys Ser Asp Trp Glu Glu Val Arg Val Ala Arg Lys Phe Glu Arg Leu 

1737 1746 1755 1764 1773 1782 

TCA GAA TGT GAG ATC CTC GAG TAC GAC ATC TAG ATT CCA AAC GTC GAG GGA CTC 

Ser Glu Cys Glu He Leu Glu Tyr Asp lie Tyr He Pro Asn Val Glu Gly Leu 

1791 . 1800 1809 1818 1827 1836 

AAG GGA AGG TTQ AGG CCG TAC GCG GTT CTG AAC CCC GGC TGG GTG AAG ATA GGC 

Lys Gly Arg Leu Axg Pro Tyr Ala Val Leu Asn Pro Gly Tip Val Lys He Gly 

1845 1854 1863 1872 1881 1890 

CTC GAC ATG AAC AAC GCG AAC G?'* GAA ACT GCG GAG ATC ATC ACT TTC GGC GGA 

Leu Asp Met Asn Asn Ala Asn Val Glu Ser Ala Glu He He Thr Pne Gly Gly 

1899 1908 1917 1926 1935 1944 

AAA GAG TAC AGA AGA TTC CAT GTA AGA ATT GAG TTC GAC AGA ATA GCG GGG GTG 

Lya Glu Tyr Arg Arg Phe His Val Arg He Glu Phe Asp Arg Thr Ala Gly Val 

1953 1962 1971 1980 1989 1998 

AAA GAA CTT CAC ATA GGA GTT GTC GGT GAT CAT CTG AGG TAC GAT GGA CCG ATT 

Lys Glu Leu His He Gly Val Val Gly Asp His Leu Arg Tyr Asp Gly Pro He 

2007 2016 2025 2034 2043 

TTC ATC GAT AAT GTG AGA CTT TAT AAA AGA ACA GGA GGT ATG TGA 3 ' 

Phe He Asp Asn Val Arg Leu Tyr Lys Arg Thr Gly Gly Met *** 



Figure lid (Continued) 
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AXPII la p-saxmoildas* (630B1) 

9 18 27 36 45 54 

5 1 ATG CTA CCA GAA GAG TTC CTA TGG CGC GTT GGG CAG TCA GGC TTT CAG TTC GAA 

Mat Leu Pro Glu Glu Phe Leu Trp Gly Val Gly Gin Ser Gly Phe Gin Phe Glu 

€3 72 81 90 99 108 

ATG GGC GAC AAG CTC AGG AGG CAC ATC GAT CCA AAT ACC CAC TGG TGG AAG TGG 

Met Gly Asp Lys Leu Arg Arg His lie Asp Pro Asn Thr Asp Trp Trp Lys Trp 

117 126 135 144 153 162 

GTT CGC GAT CCT TTC AAC ATA AAA AAG GAG CTT GTG AGT GGG GAC CTT CCC GAG 

Val Arg Asp Pro Phe Asn lie Lys Lys Glu Leu Vol Ser Gly Asp Leu Pro Glu 

171 1B0 IBS 19B 207 216 

GAC GGC ATC AAC AAC TAC GAA CTT TTT GAA AAC GAT CAC AAG CTC GCT AAA GGC 

Asp Gly lie Asn Asn Tyr Glu Leu Pha Glu Asn. Asp His Lys Leu Ala Lys Gly 

225 234 243 252 261 270 

CTT GGA CTC AAC GCA TAC AGG ATT GGA ATA GAG TGG AGC AGA ATC TTT CCC TGG 

Leu Gly Leu Asn Ala Tyr Arg lie Gly lie Glu Trp Ser Arg He Phe Pro Trp 

279 288 297 306 315 324 

CCG ACG TGG ACG GTC GAT ACC GAG GTC GAG TTC GAC ACT TAC GCT TTA GTA AAG 

Pro Thr Trp Thr Val Asp Thr Glu Val Glu Phe Asp Thr Tyr Gly Leu Val Lys 

333 342 351 360 369 378 

GAC GTT AAG ATA GAC AAG TCC ACC CTT GCT GAA CTC GAC AGG CTG GCC AAC AAG 

Asp Val Lys He Asp Lys Ser Thr Leu Ala Glu Leu Asp Arg Leu Ala Asn Lys 

387 396 405 414 423 432 

GAG GAG GTA ATG TAC TAC AGG CGC GTT ATT CAG CAT TTG AGG GAG CTC GGC TTC 

Glu Glu Val Met Tyr Tyr Arg Arg Val He Gin His Leu Arg Glu Leu Gly Phe 

441 450 459 46B 477 4B6 

AAG GTC TTC GTT AAC CTC AAC CAC TTC ACG CTT CCA ATA TGG CTC CAC GAC CCG 

Lye Val Phe Val Asn Leu Asn His Phe Thr Leu Pro He Trp Leu His Asp Pro 

495 504 513 522 531 540 

ATA GTG GCA AGG GAG AAG GCC CTC ACA AAC GAC AGA ATC GGC TGG GTC TCC CAG 

lie Val Ala Arg Glu Lys Ala Leu Thr Asn Asp Arg He Gly Trp Val Ser Gin 

Figure 12£L 
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X»II la f}-»a&aosidaee (630*1) (continued) 

549 558 567 576 585 594 

AGG ACA CTT GTT GAG TTT GCC AAG TAT GCT GCT TAC ATC GCC CAT GCG CTC GGA 

Arg Thr Val Val Glu Phe Ala Lys Tyr Ala Ala Tyr tie Ala His Ala Leu Gly 

603 612 621 630 639 648 

GAC CTC GTG GAC ACA TGG AGC ACC TTC AAC GAA CCT ATG GTA GTT GTG GAG CTC 

Asp Leu Val Asp Thr Tip Ser Thr Phe Asn Glu Pro Met VaX Val Val Glu Leu 

657 666 675 684 693 702 

GGC TAC CTC GCC CCC TAC TCA GGA TIT CCC CCG GGA GTC ATG AAC CCC GAG GCC 

Gly Tyr Leu Ala Pro Tyr Ser Gly Phe Pro Pro Gly Val Met Asn Pro Glu Ala 

711 720 729 738 747 756 

GCG AAG CTG GCG ATC CTC AAC ATC ATA AAC GCC CAC GCC TTG GCA TAT AAG ATG 

Ala Lys Leu Ala lie Leu Asn Met lie Asn Ala His Ala Leu Ala Tyr Lya Met 

765 774 783 792 801 810 

ATA AAG AGG TTC GAC ACC AAG AAG GCC GAT GAG GAT AGC AAG TCC CCT GCG GAC 

lie Lys Arg Phe Asp Thr Lys Lys Ala Asp Glu Asp Ser Lys Ser Pro Ala Asp 

619 828 837 84$ 855 864 

GTT GGC ATA ATT TAC AAC AAC ATC GGT GTT GCC TAC CCT AAA GAC CCT AAC GAT 

Val Gly lie lie Tyr Asn Asn He Gly Val Ala Tyr Pro Lys Asp Pro Asn Asp 

873 882 891 900 909 918 

CCC AAG GAC GTT AAA GCA GCC GAA AAC GAC AAC TAC TTC CAC AGC GGA CTG TTC 

Pro Lys Asp Val Lys Ala Ala Glu Asn Asp Asn Tyr Phe His Ser Gly Leu Phe 

927 936 945 954 963 972 

TTT GAT GCC ATC CAC AAG GGT AAG CTC AAC ATA GAC TTC GAC GGC GAA AAC TTT 

Phe Asp Ala He His Lys Gly Lys Leu Asn He Glu Phe Asp Gly Glu Asn Phe 

981 990 999 1008 1017 1026 

GTA AAA GTT AGA CAC CTA AAA GGC AAT GAC TGG ATA GGC CTC AAC TAC TAC ACC 

Val Lys Val Arg His Leu Lys Gly Asn Asp Trp He Gly Leu Asn Tyr Tyr Thr 

1035 1044 1053 1062 1071 1080 

CGC GAC GTT GTT AGA TAT TCG GAG CCC AAG TTC CCA AGT ATA CCC CTC ATA TCC 

Arg Glu Val Val Arg Tyr Ser Glu Pro Lys Phe Pro Ser He Pro Lau He Ser 

Figure 12b(Continued) 
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XIPII la P-aaanoaidaM (63QB1) < continued) 

1089 1098 U07 1X16 1125 1134 

TTC AAG CGC CTT CCC AAC TAC GGC TAC TCC TGC AGG CCC GGC ACG ACC TCC CCC 

Phe Lys Gly Val Pro Asn Tyr Gly Tyr Ser Cys Arg Pro Cly Thr Thr Ser Ala 

1143 1152 1161 1170 1179 U88 

GAT GGC ATG CCC GTC AGC GAT ATC GGC TGG GAA GTC TAT CCC CAG GGA ATC TAC 

Asp Gly Met Pro Val Ser Asp lie Gly Trp Glu Val Tyr Pro Gin Gly II* Tyr 

1157 1206 1215 1224 1233 1242 

GAC TCG ATA GTC GAG GCC ACC AAG TAC AGT GTT CCT GTT TAC GTC ACC GAG AAC 

Asp Ser He Val Glu Ala Thr Lys Tyr Ser Val Pro Val Tyr Val Thr Glu Asn 

12S1 . 1260 1269 127B 1287 1296 

GGT GTT GCG GAT TCC GCG GAC ACG CTG AGG CCA TAC TAC ATA GTC AGC CAC GTC 

Gly Val Ala Asp Ser Ala Asp Thr Leu Arg Pro Tyr Tyr lie Val Ser His Val 

1305 ' 1314 1323 1332 1341 1350 

TCA AAG ATA GAG GAA GCC ATT GAG AAT GGA TAC CCC GTA AAA GGC TAC ATG TAC 

Ser Lys He Glu Glu Ala Ha Glu Asn Gly Tyr Pro Val Lys Gly Tyr Met Tyr 

1359 1368 1377 13B6 1395 1404 

TGG GCG CTT ACG GAT AAC TAC GAG TGG GCC CTC GGC TTC AGC ATG AGG TTT GGT 

Trp Ala Lou Thr Asp Asa Tyr Glu Trp Ala Leu Gly Phe Ser Met Arg Phe Gly 

1413 1422 1431 1440 1449 1458 

CTC TAC AAG GTC GAC CTC ATC TCC AAG GAG AGG ATC CCG AGG GAG AGA AGC GTT 

Leu Tyr Lys Val Asp Leu He Ser Lys Glu Arg He Pro Arg Glu Arg Ser Val 

1467 1476 148S 1494 1503 1512 

GAG ATA TAT CGC AGG ATA GTG CAG TCC AAC GGT GTT CCT AAG GAT ATC AAA GAG 

Glu lie Tyr Arg Arg He Val Gin Ser Asn Gly Val Pro Lys Asp He Lys Glu 

1521 1530 1539 

GAG TTC CTG AAG GGT GAG GAG AAA TGA 3' 



Glu Phe Leu Lys Gly Glu Glu Lys ** 



Figure 12C( Continued) 
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OCl/iV Xadoglucaaase (330P1) 

9 X8 27 36 45 54 

5 ' ATG GTA GAA AGA CAC TTC AGA TAT GTT CTT ATT TGC ACC CTQ TTT CTT GTT ATC 

Met val Glu Arg His Phe Arg Tyr Val Leu lie Cya Thr Leu Phe Leu Val Met 

63 72 8X SO 99 108 

CTC CTA ATC TCA TCC ACT CAG TGT GGA AAA AAT GAA CCA AAC AAA AGA GTG AAT 

Leu Leu IX e Ser Ser Thr Gin Cya GXy Lya Asn Glu Pro Asn Lys Arg Val Asn 

X17 126 135 U4 153 162 

AGC ATG GAA CAG TCA GTT OCT GAA AGT GAT AGC AAC TCA GCA TTT GAA TAC AAC 

Ser Met Glu Gin Ser Val Ala Glu Sex Asp Ser Asn Ser Ala Phe Glu Tyr Asn 

171 180 189 198 207 216 

AAA ATG GTA GGT AAA GGA GTA AAT ATT GGA AAT GOT TTA GAA GCT CCT TTC GAA 

Lys Mec Val Gly Lya Gly Val Asn He Gly Asn Ala Leu Glu Ala Pro Phe Glu 

225 234 243 2S2 261 270 

GGA GCT TGG GGA GTA AGA ATT GAG GAT GAA TAT TTT CAG ATA ATA AAG AAA AGG 

Gly Ala Trp Gly Val Arg lie Glu Asp Glu Tyr Ph« Glu He He Lys Lys Arg 

279 288 297 306 315 324 

GGA TTT GAT TCT GTT AGG ATT CCC ATA AGA TGG TCA GCA CAT ATA TCC GAA AAG 

Gly Phe Asp Ser Val Arg He Pro He Arg Trp Ser Ala His lie Ser Glu Lys 

333 342 351 360 369 378 

CCA CCA TAT GAT ATT GAC AGG AAT TTC CTC GAA AGA GTT AAC CAT GTT GTC GAT 

Pro Pro Tyr Asp He Asp Arg Asn Phe Leu Glu Arg Val Asn His Val Val Asp 

387 396 405 414 423 432 

AGG GCT CTT GAG AAT AAT TTA ACA GTA ATC ATC AAT ACG CAC CAT TTT GAA GAA 

Arg Ala Leu Glu Asn Asn Leu Thr Val Ha lie Asn Thr His His Phe Glu Glu 

441 450 459 468 477 486 

CTC TAT CAA GAA CCG GAT AAA TAC GGC GAT GTT TTG GTG GAA ATT TGG AGA CAC 

Leu Tyr Gin Glu Pro Aap Lya Tyr Gly Asp Val Leu Val Glu lie Trp Arg Gin 

495 504 513 522 531 540 

ATT GCA AAA TTC TTT AAA GAT TAC CCG GAA AAT CTG TTC TTT GM ATC TAC AAC 

lie Ala Lya Phe Phe Lys Asp Tyr Pro Glu Asn Leu Phe Phe Glu He Tyr Asn 

Figure 13A- 



WO 98/24799 



28/46 



PCT/US97/22623 



OCI/4V sadoglucanase (330P1) (continued) 

549 $58 S67 576 565 594 

CAG CCT GCT CAG AAC TTG ACA GCT GAA AAA TCG AAC GCA CTT TAT CCA AAA GTG 

GXu Pro Ala Gin Asn Leu Thr Ala Glu Lys Trp Asn Ala Leu Tyr Pro Lys Val 

603 612 621 630 639 648 

CTC AAA GTT ATC AGG GAG AGC AAT CCA ACC CGG ATT GTC ATT ATC GAT GCT CCA 

Leu Lys Val lie Arg Glu Ser Asn Pro Thr Arg lie Val lie He Asp Ala Pro 

657 666 675 684 693 702 

AAC TGG GCA CAC TAT AGC GCA GTG AGA AGT CTA AAA TTA GTC AAC GAC AAA CGC 

Asn Trp Ala His Tyr Ser Ala Val Arg Ser Lou Lys Leu Val Asn Asp Lys Arg 

711 720 729 739 747 756 

ATC ATT GTT TCC TTC CAT TAG TAC GAA CCT TTC AAA TTC ACA CAT CAG GCT GCC 

He lie Val Sex Phe Hia Tyr Tyr Glu Pro Phe Lys Phe Thr His Gin Gly Ala 

765 774 783 793 801 810 

GAA TGG GTT AAT CCC ATC CCA CCT GTT AGG GTT AAG TGG AAT GGC GAG GAA TGG 

Glu Trp Val Asn Pro He Pro Pro Val Arg Val Lys Trp Asn Gly Glu Glu Trp 

819 828 837 846 855 864 

GAA ATT AAC CAA ATC AGA AGT CAT TTC AAA TAC GTG AGT GAC TGG GCA AAG CAA 

Glu lie Asn Gin He Arg Ser His Phe Lys Tyr Val Ser Asp Trp Ala Lys Gin 

873 882 891 900 909 918 

AAT AAC CTA CCA ATC TTT CTT GGT GAA TTC GGT GCT TAT TCA AAA GCA GAC ATG 

Asn Asn Val Pro He Phe Leu Gly Glu Phe Gly Ala Tyr Ser Lys Ala Asp Het 

927 936 945 954 963 572 

GAC TCA AGG GTT AAG TGG ACC GAA AGT GTG AGA AAA ATG GCG GAA GAA TTT GGA 

Asp Ser Arg val Lys Trp Thr Glu Ser Val Arg Lys Met Ala Glu Glu Phe Gly 

981 990 999 1008 1017 1026 

TTT TCA TAC GCG TAT TGG GAA TTT TGT GCA GGA TTT GGC ATA TAC GAT AGA TGG 

Phe Ser Tyr Ala Tyr Trp Glu Phe Cya Ala Gly Phe Gly He Tyr Asp Arg Trp 

1035 1044 1053 1062 1071 1080 

TCT CAA AAC TGG ATC GAA CCA TTG GCA ACA GCT GTG GTT GGC ACA GGC AAA GAG 

Ser Gin Asn Trp lie Glu Pro Leu Ala Thr Ala Val Val Gly Thr Gly Lys Glu 

TAA 3 ' 

* * • 

Figure 13b (Continued) 
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Tfcexmotog* »ariti*a Pullulanase (6073) 

9 18 27 36 45 54 

5' ATG GAT CTT ACA AAG GTG GGC ATC ATA CTG AGG CTG AAC GAG TGG CAG GCA AAA 

Met Asp Leu Thr Lys Val Gly lie lie Val Arg Leu Asn Glu Trp Gin Ala Lys 

63 72 81 90 99 108 

GAC GTG GCA AAA GAC AGG TTC ATA GAG ATA AAA GAC GGA AAG GCT GAA GTG TGG 

Asp Val Ala Lys Asp Axg Phe lie Glu He Lys Asp Gly Lys Ala Glu Val Trp 

117 126 135 144 153 162 

ATA CTC CAG GGA GTG GAA GAG ATT TTC TAC GAA AAA CCA GAC ACA TCT CCC AGA 

lie Leu Gin Gly Val Glu Glu He Phe Tyr Glu Lys Pro Asp Thr Ser Pro Axg 

171 180 189 198 207 216 

ATC TTC TTC GCA CAG GCA AGG TCG AAC AAG GTG ATC GAG GCT TTT CTG ACC AAT 

He Phe Phe Ala Gin Ala Arg Ser Asn Lye Val He Glu Ala Phe Leu Thr Asn 

225 234 243 252 261 270 

CCT CTG GAT ACG AAA AAG AAA GAA CTC TTC AAG GTT ACT GTT GAC GGA AAA GAG 

Pro Val Asp Thr Lys Lys Lys Glu Leu Phe Lys Val Thr Val Asp Gly Lys Glu 

279 288 297 306 315 324 

ATT CCC GTC TCA AGA GTG GAA AAG GCC GAT CCC ACG GAC ATA GAC GTG ACG AAC 

lie Pro Val Ser Arg Val Glu Lys Ala Asp Pro Thr Asp lie Asp Val Thr Asn 

333 342 351 360 369 378 

TAC GTG AGA ATC GTC CTT TCT GAA TCC CTG AAA GAA GAA GAC CTC AGA AAA GAC 

Tyr' Val Arg He Val Leu Ser Glu Ser Leu Lys Glu Glu Asp Leu Arg Lys Asp 

387 396 405 414 423 432 

GTG GAA CTG ATC ATA GAA GGT TAC AAA CCG GCA AGA GTC ATC ATG ATG GAG ATC 

Val Glu Leu Xle He Glu Gly Tyr Lys Pro Ala Arg Val He Met Met Glu He 

441 450 459 468 477 486 

CTG GAC GAC TAC TAT TAC GAT GGA GAG CTC GGA GCC GTA TAT TCT CCA GAG AAG 

Leu Asp Asp Tyr Tyr Tyr Asp Gly Glu Leu Gly Ala Val Tyr Ser Pro Glu Lys 

495 504 513 522 531 540 

ACG ATA TTC ACA GTC TGG TCC CCC GTT TCT AAG TGG GTA AAG GTG CTT CTC TTC 

Thr He" Phe Arg Val Trp Ser Pro Val Ser Lys Trp Val Lys Val Leu Leu Phe 

Figure 14*— 
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Thersotoga maritisui Pullulanaae <60>3) (continued) 

549 558 567 576 505 594 

AAA AAC CCA GAA GAC ACA CAA CCG TAC CAG GTT CTG AAC ATG GAA TAC AAG CCA 

Lys Asn Gly Glu Asp Thr Glu Pro Tyr Gin Val Val Asn Met Glu Tyr Lys Cly 

603 612 621 630 639 648 

AAC GGG GTC TOG GAA GCG GTT GTT GAA GGC GAT CTC GAC GGA GTG TTC TAC CTC 

Asn Gly Val Trp Glu Ala Val Val Glu Gly Asp Leu Asp Gly Val Phe Tyr Leu 

657 666 675 684 693 702 

TAT CAG CTG GAA AAC TAC GGA AAG ATC AGA ACA ACC GTC GAT CCT TAT TCG AAA 

Tyr Gin Leu Glu Aan Tyr Gly Lys II* Arg Thr Thr Val Asp Pro Tyr Ser Lys 

711 r 720 729 738 747 756 

GCG GTT TAC GCA AAC AAC CAA GAG AGC GCC GTT GTG AAT CTT GCC AGG ACA AAC 

Ala Val Tyr Ala Asn Aan Gin Glu Ser Ala Val Val Aan Leu Ala Arg Thr Asn 

765 774 7B3 792 801 810 

' CCA GAA GGA TGG GAA AAC GAC AGG GGA CCG AAA ATC GAA GGA TAC GAA GAC GCG 

Pro Glu Gly Trp Glu Asn Asp Arg Gly Pro Lys lie Glu Gly Tyr Glu Asp Ala 

819 82B 837 846 855 864 

ATA ATC TAT GAA ATA CAC ATA GCG GAC ATC ACA GGA CTC GAA AAC TCC GGG GTA 

lie lie Tyr Glu lie His lie Ala Asp lie Thr Gly Leu Glu Asn Ser Gly Val 

873 882 891 900 909 918 

AAA AAC AAA GGC CTC TAT CTC GGG CTC ACC GAA GAA AAC ACG AAA GGA CCG GGC 

Lys Asn Lys Gly Leu Tyr Leu Gly Leu Thr Glu Glu Asn Thr Lys Gly Pro Gly 

927 936 945 954 963 972 

GGT GTG ACA ACA GGC CTT TCG CAC CTT GTG GAA CTC GGT GTT ACA CAC GTT CAT 

Gly Val Thr Thr Gly Leu Ser His Leu Val Glu Leu Gly Val Thr His Val His 

981 990 999 1008 1017 1026 

ATA CTT CCT TTC TTT GAT TTC TAC ACA GGC GAC GAA CTC GAT AAA GAT TTC GAG 

lie Leu Pro Phe Phe Asp Phe Tyr Thr Gly Asp Glu Leu Asp Lys Asp phe Glu 

1035 1044 1053 1062 1071 1080 

AAG TAC TAC AAC TGG GGT TAC GAT CCT TAC CTG TTC ATG GTT CCG GAG GGC AGA 

Lys Tyr Tyr Asn Trp Cly Tyr Asp Pro Tyr Leu Phe Met Val Pro Glu Gly Arg 

Figure 14b (Continued) 
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Tbenaotoga smritisui ttallulaaasa (60P3) (continued) 

1089 1098 1107 1116 1125 U34 

TAC TCA ACC GAT CCC AAA AAC CCA CAC ACG AGA ATC AGA GAA CTC AAA GAA ATG 

Tyr Ser Thr Asp Pro Lys Asn Pro His Thr Arg lie Arg Glu Val Lys Glu Met 

1143 1152 U«l 1170 1179 H88 

GTC AAA GCC CTT CAC AAA CAC GGT ATA GGT GTG ATT ATG GAC ATG GTG TTC CCT 

Val Lys Ala Leu His Lys His Gly lie Gly Val lie Met Asp Met Val Phe Pro 

1197 1206 1215 1224 1233 1242 

CAC ACC TAC GGT ATA GGC GAA CTC TCT GCG TTC GAT CAG ACG GTG CCG TAC TAC 

His Thr Tyr Gly He Gly Glu Leu Ser Ala Phe Asp Gin Thr Val Pro Tyr Tyr 

1251 1260 1269 1278 1287 1296 

TTC TAC AGA ATC GAC AAG ACA GGT GCC TAT TTG AAC GAA AGC GGA TGT GGT AAC 

Phe Tyr Arg lie Asp Lys Thr Gly Ala Tyr Leu Asn Glu Ser Gly Cys Gly Asn 

1305 1314 1323 1332 1341 1350 

GTC ATC GCA AGC GAA AGA CCC ATG ATG AGA AAA TTC ATA GTC GAT ACC GTC ACC 

Val lie Ala Ser Glu Arg Pro Met Met Arg Lys Phe lie Val Asp Thr Val Thr 

1359 1368 1377 1386 1395 1404 

TAC TGG GTA AAG GAG TAT CAC ATA GAC GGA TTC AGG TTC GAT CAG ATG GGT CTC 

Tyr Trp Val Lye Glu Tyr His lie Asp Gly Phe Arg Phe Asp Gin Met Gly Leu 

1413 1422 1431 1440 1449 1458 

ATC GAC AAA AAG ACA ATG CTC GAA GTC GAA AGA GCT CTT CAT AAA ATC GAT CCA 

He Asp Lys Lys Thr Met Leu Glu Val Glu Arg Ala Leu His Lys lie Asp Pro 

1467 1476 1485 1494 1503 1512 

ACT ATC ATT CTC TAC GGC GAA CCG TOG GGT GGA TGG GGA GCA CCG ATC AGG TTT 

Thr He lie Leu Tyr Gly Glu Pro Trp Gly Gly Trp Gly Ala pro He Arg Phe 

1521 1530 1S39 1548 1557 1566 

GGA AAG AGC GAT GTC GCC GGC ACA CAC GTG GCA GCT TTC AAC GAT GAG TTC AGA 

Gly Lys Ser Asp Val Ala Gly Thr His Val Ala Ala Phe Asn Asp Glu Phe Arg 

1575 " 1584 1593 1602 1611 1620 

GAC GCA ATA AGG GGT TCC GTG TTC AAC CCG AGC GTC AAG GGA TTC GTC ATG GGA 

Asp Ala He Arg Gly Ser Val Phe Asn Pro Ser Val Lye Gly Phe Val Met Gly 

Figure 14C( Continued) 
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Tberaotoga maritime fruUttlajaase (60P3) (continued) 

1629 1638 1647 1656 1665 1674 

GGA TAC CCA AAG GAA ACC AAG ATC AAA AGG GCT GTT GTT GGA AGC ATA AAC TAC 

Gly Tyr Gly Lys Glu Thr Lys lie Lys Arg Gly Val Val Gly Scr lie Asn Tyr 

1683 1692 1701 1710 1719 1728 

GAC GGA AAA CTC ATC AAA AGT TTC GCC CTT GAT CCA GAA GAA ACT ATA AAC TAC 

Asp Gly Lys Leu He Lys Ser Phe Ala Leu Asp Pro Glu Glu Thr lie Asn Tyr 

1737 1746 1755 1764 1773 1782 

GGA GCG TGT CAC GAC AAC CAC ACA CTG TGG GAC AAG AAC TAC CTT GCC GCC AAA 

Ala Ala Cys His Asp Asn His Thr Leu Trp Asp Lys Asn Tyr Leu Ala Ala Lys 

1791 - 1800 1809 1618 1827 1836 

GCT GAT AAG AAA AAG GAA TGG ACC GAA GAA GAA CTG AAA AAC GCC CAG AAA CTG 

Ala Asp Lys Lys Lys Glu Trp Thr Glu Glu Glu Leu Lys Asn Ala Gin Lys Leu 

1845 1854 1863 1872 1881 1690 

C~T GGT GCG ATA CTT CTC ACT TCT CAA GCT GTT CCT TTC CTC CAC GGA GGG CAG 

Ala Gly Ala He Leu Leu Thr Ser Gin Gly Val Pro Phe Leu His Gly Gly Gin 

1899 1908 1917 1926 1935 1944 

CAC TTC TCC AGG ACQ ACG AAT TTC AAC GAC AAC TCC TAC AAC GCC CCT ATC TCG 

Asp Phe Cys Arg Thr Thr Asn Phe Asn Asp Aan Ser Tyr Asn Ala Pro He Ser 

1953 1962 1971 1930 1989 1998 

ATA AAC GGC TTC GAT TAC GAA AGA AAA CTT CAG TTC ATA GAC GTG TTC AAT TAC 

He Asn Gly Phe Asp Tyr Glu Arg Lys Leu Gin Phe He Asp Val Phe Asn Tyr 

2007 2016 2025 2034 2043 2052 

CAC AAG GGT CTC ATA AAA CTC AGA AAA GAA CAC CCT GCT TTC AGG CTG AAA AAC 

His Lys Gly Leu He Lys Leu Arg Lys Glu His Pro Ala Phe Arg Leu Lys Asn 

2061 2070 2079 2088 2097 2106 

GCT GAA GAG ATC AAA AAA CAC CTG GAA TTT CTC CCG GGC GGG AGA AGA ATA GTT 

Ala Glu Glu He Lys Lys His Leu Glu Phe Leu Pro Gly Gly Arg Arg He Val 

2115 2124 2133 2142 2151 2160 

GCG TTC ATG CTT AAA GAC CAC GCA GGT GGT GAT CCC TGG AAA CAC ATC GTG GTG 

Ala Phe Met Leu Lys Asp His Ala Gly Gly Asp Pro Trp Lys Asp He Val Val 

Figure 14<4( Continued) 
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Tbaxaotoga aaritiaa fuXlulanaaa (COV3) < continued) 

2169 2178 2187 2196 2205 2214 

ATT TAC AAT GGA AAC TTA GAG AAG ACA ACA TAC AAA CTG CCA GAA GCA AAA TOG 

lie Tyr Aan Gly Ann Leu Glu Lys Thr Thr Tyr Lys Leu Pro Glu Gly Lye Trp 

2223 2232 2241 2250 2259 2266 

AAT GTG GTT GTG AAC AGC CAG AAA GCC GGA ACA GAA GTG ATA GAA ACC GTC GAA 

Aim Val Val Val Acn Ser Gin Lys Ala Gly Thar Glu Val lie Glu Thr Val Glu 

2277 2286 2295 2304 2313 

GGA ACA ATA GAA CTC GAT CCG CTT TCC GCG TAC GTT CTG TAC AGA GAG TGA 3 ' 

Gly Thr lie Glu L«u hep Pro Leu Ser Ala Tyr Val Leu Tyr Arg Glu *** 



Figure UC( Continued) 
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Figure 15* Thermotoga maritima MSB 8 (Clone # 6GP2) Glycosidase 

1 

CTT TTA TTG ATC GTT GAG CTC TCT TTC GTT CTC TTT GCA AGT GAC GAG TTC 
Leu Leu Leu lie Val Glu Leu Ser Phe Val Leu Phe Ala Ser Asp Glu Phe 

GTG AAA GTG GAA AAC GGA AAA TTC GCT CTG AAC GGA AAA GAA TTC AGA TTC 
Val Lys Val Glu Asn Gly Lys Phe Ala Leu Asn Gly Lys Glu Phe Arg Phe 

ATT GGA AGC AAC AAC TAC TAC ATG CAC TAC AAG AGC AAC GGA ATG ATA GAC 
lie Gly Ser Aen Asn Tyr Tyr Met His Tyr Lys Ser Asn Gly Met lie Asp 

AGT GTT CTG GAG AGT GCC AGA GAC ATG GGT ATA AAG GTC CTC AGA ATC TGG 
Ser Val Leu Glu Ser Ala Arg Asp Met Gly lie Lys Val Leu Arg lie Trp 

GGT TTC CTC GAC GGG GAG AGT TAC TGC AGA GAC AAG AAC ACC TAC ATG CAT 
Gly Phe Leu Asp Gly Glu Ser Tyr Cys Arg Asp Lys Asn Thr Tyr Met Kis 

CCT GAG CCC GGT GTT TTC GGG GTG CCA GAA GGA ATA TCG AAC GCC CAG AGC 
Pro Glu Pro Gly Val ?ne Gly Val Pro Glu Gly lie Ser Asn Ala Gin Ser 

GGT TTC GAA AGA CTC GAC TAC ACA GTT GCG AAA GCG AAA GAA CTC GGT ATA 
Gly Phe Glu Arg Leu Asp Tyr Thr Val Ala Lya Ala Lys Glu Leu Gly lie 

AAA CTT GTC ATT GTT CTT GTG AAC AAC TGG GAC GAC TTC GGT GGA ATG AAC 
Lys Leu Val lie Val Leu Val Asn Asn Trp Asp Asp Phe Gly Gly Met Asn 

CAG TAC GTG AGG TGG TTT GGA GGA ACC CAT CAC GAC GAT TTC TAC AGA GAT 
Gin Tyr Val Arg Trp Phe Gly Gly Thr His His Asp Asp Phe Tyr Arg Asp 

GAG AAG ATC AAA GAA GAG TAC AAA AAG TAC GTC TCC TTT CTC GTA AAC CAT 
Glu Lys lie Lys Glu Glu Tyr Lys Lys Tyr Val Ser Phe Leu Val Asn His 

GTC AAT ACC TAC ACG GGA GTT CCT TAC AGG GAA GAG CCC ACC ATC ATG GCC 
Val Asn Thr Tyr Thr Gly Val Pro Tyr Arg Glu Glu Pro Thr lie Met Ala 

TGG GAG CTT GCA AAC GAA CCG CGC TGT GAG ACG GAC AAA TCG GGG AAC ACG 
Trp Glu Leu Ala Asn Glu Pro Arg Cys Glu Thr Asp Lys Ser Gly Asn Thr 
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CTC GTT GAG TGG GTG AAG GAG ATG AGC TCC TAC ATA AAG AGT CTG GAT CCC 
Leu Val Glu Trp Val Lys Glu Met Ser Ser Tyr lie Lys Ser Leu Asp Pro 

AAC CAC CTC GTG GCT GTG GGG GAC GAA GGA TTC TTC AGC AAC TAC GAA GGA 
Asn His Leu Val Ala Val Gly Asp Glu Gly Phe Phe Ser Asn Tyr Glu Gly 

TTC AAA CCT TAC GGT GGA GAA GCC GAG TGG GCC TAC AAC GGC TGG TCC GGT 
Phe Lys Pro Tyr Gly Gly Glu Ala Glu Trp Ala Tyr Asn Gly Trp Ser Gly 

GTT GAC TGG AAG AAG CTC CTT TCG ATA GAG ACG GTG GAC TTC GGC ACG TTC 
Val Asp Trp Lys Lys Leu Leu Ser He Glu Thr Val Asp Phe Gly Thr Phe 

CAC CTC TAT CCG TCC CAC TGG GGT GTC AGT CCA GAG AAC TAT GCC CAG TGG 
His Leu Tyr Pro Ser His Trp Gly Val Ser Pro Glu Asn Tyr Ala Gin Trp 

GGA GCG AAG TGG ATA GAA GAC CAC ATA AAG ATC GCA AAA GAG ATC GGA AAA 
Gly Ala Lys Trp He Glu Asp His He Lys He Ala Lys Glu He Gly Lys 

CCC GTT GTT CTG GAA GAA TAT GGA ATT CCA AAG AGT GCG CCA GTT AAC AGA 
Pro Val Val Leu Glu Glu Tyr Gly He Pro Lys Ser Ala Pro Val Asn Arg 

ACG GCC ATC TAC AGA CTC TGG AAC GAT CTG GTC TAC GAT CTC GGT GGA GAT 
Thr Ala He Tyr Arg Leu Trp Asn Asp Leu Val Tyr Asp Leu Gly Gly Asp 

GGA GCG ATG TTC TGG ATG CTC GCG GGA ATC GGG GAA GGT TCG GAC AGA GAC 
Gly Ala Met Phe Trp Met Leu Ala Gly lie Gly Glu Gly Ser Asp Arg Asp 

GAG AGA GGG TAC TAT CCG GAC TAC GAC GGT TTC AGA ATA GTG AAC GAC GAC 
Glu Arg Gly Tyr Tyr Pro Asp Tyr Asp Gly Phe Arg He Val Asn Asp Asp 

AGT CCA GAA GCG GAA CTG ATA AGA GAA TAC GCG AAG CTG TTC AAC ACA GGT 
Ser Pro Glu Ala Glu Leu He Arg Glu Tyr Ala Lys Leu Phe Asn Thr Gly 

GAA GAC ATA AGA GAA GAC ACC TGC TCT TTC ATC CTT CCA AAA GAC GGC ATG 
Glu Asp lie Arg Glu Asp Thr Cys Ser Phe He Leu Pro Lys Asp Gly Met 

GAG ATC AAA AAG ACC GTG GAA GTG AGG GCT GGT GTT TTC GAC TAC AGC AAC 



Figure 15b (continued) 
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Glu lie Lys Lys Thr Val Glu Val Arg Ala Gly Val Phe Asp Tyr Ser Asn 

ACG TTT GAA AAG TTG TCT GTC AAA GTC GAA GAT CTG GTT TTT GAA AAT GAG 
Thr Phe Glu Lys Leu Ser Val Lys Val Glu Asp Leu Val Phe Glu Asn Glu 

ATA GAG CAT CTC GGA TAC GGA ATT TAC GGC TTT GAT CTC GAC ACA ACC CGG 
He Glu His Leu Gly Tyr Gly He Tyr Gly Phe Asp Leu Asp Thr Thr Arg 

ATC CCG GAT GGA GAA CAT GAA ATG TTC CTT GAA GGC CAC TTT CAG GGA AAA 
He Pro Asp Gly Glu His Glu Met Phe Leu Glu Gly His Phe Gin Gly Lys 

ACG GTG AAA GAC TCT ATC AAA GCG AAA GTG GTG AAC GAA GCA CGG TAC GTG 
Thr Val Lys Asp Ser He Lys Ala Lys Val Val Asn Glu Ala Arg Tyr Val 

CTC GCA GAG GAA GTT GAT TTT TCC TCT CCA GAA GAG GTG AAA AAC TGG TGG 
Leu Ala Glu Glu Val Asp Phe Ser Ser Pro Glu Glu Val Lys Asn Trp Trp 

AAC AGC GGA ACC TGG CAG GCA GAG TTC GGG TCA CCT GAC ATT GAA TGG AAC 
Asn Ser Gly Thr Trp Gin Ala Glu Phe Gly Ser Pro Asp He Glu Trp Asn 

GGT GAG GTG GGA AAT GGA GCA CTG CAG CTG AAC GTG AAA CTG CCC GGA AAG 
Gly Glu Val Gly Asn Gly Ala Leu Gin Leu Asn Val Lys Leu Pro Gly Lys 

AGC GAC TGG GAA GAA GTG AGA GTA GCA AGG AAG TTC GAA AGA CTC TCA GAA 
Ser Asp Trp Glu Glu Val Arg Val Ala Arg Lys Phe Glu Arg Leu Ser Glu 

TGT GAG ATC CTC GAG TAC GAC ATC TAC ATT CCA AAC GTC GAG GGA CTC AAG 
Cys Glu He Leu Glu Tyr Asp He Tyr He Pro Asn Val Glu Gly Leu Lys 

GGA AGG TTG AGG CCG TAC GCG GTT CTG AAC CCC GGC TGG GTG AAG ATA GGC 
Gly Arg Leu Arg Pro Tyr Ala Val Leu Asn Pro Gly Trp Val Lys lie Gly 

CTC GAC ATG AAC AAC GCG AAC GTG GAA AGT GCG GAG ATC ATC ACT TTC GGC 
Leu Asp Met Asn Asn Ala Asn Val Glu Ser Ala Glu He lie Thr Phe Gly 

GGA AAA GAG TAC AGA AGA TTC CAT GTA AGA ATT GAG TTC GAC AGA ACA GCG 
Gly Lys Glu Tyr Arg Arg Phe His Val Arg He Glu Phe Asp Arg Thr Ala 



Figure 1 5C( continued) 
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Figure No. l^Thermotoga maritima MSB8(6gb4) 

1 ATG AAA AGA ATC GAC CTG AAT GGT TTC TGG AGC GTT AGG GAT AAC GAA GGG AGA TTT TCG 60 

1 Met Lys Arg lie Asp Leu Asn Gly Phe Trp Ser Val Arg Asp Asn Glu Gly Axg Phe Ser 20 

61 TTT GAA GGG ACT GTG CCA GGG GTT GTC CAG GCA GAT CTG GTC AGA AAA GGT CTT CTT CCA 120 

21 Phe Glu Gly Thr Val Pro Gly Val Val Gin Ala Asp Leu Val Arg Lys Gly Leu Leu Pro 40 

121 CAC CCG TAC GTT GGG ATG AAC GAA GAT CTC TTC AAG GAA ATA GAA GAC AGA GAG TGG ATC lflO 

41 His Pro Tyr val Gly Met Asn Glu Asp Leu Phe Lys Glu lie Glu Asp Arg Glu Trp He 60 

181 TAC GAG AGG GAG TTC GAG TTC AAA GAA GAT GTG AAA GAG GGG GAA CGT GTC GAT CTC GTT 240 

61 Tyr Glu Arg Glu Phe Glu Phe Lys Glu Asp Val Lys Glu Gly Glu Arg Val Asp Leu Val 80 

241 TTT GAG GGC GTC GAC ACG CTG TCG GAT GTT TAT CTG AAC GGT GTT TAC CTT GGA AGC ACC 300 

81 Phe Glu Gly Val Asp Thr Leu Ser Asp Val Tyx Leu Asn Gly Val Tyr Leu Gly Ser Thr 100 

301 GAA GAC ATG TTC ATC GAG TAT CGC TTC GAT GTC ACG AAC GTG TTG AAA GAA AAG AAT CAC 360 

101 Glu Asp Met Phe He Glu Tyr Arg Phe Asp Val Thr Asn Val Leu Lys Glu Lys Asn His 120 

361 CTG AAG GTG TAC ATA AAA TCT CCC ATC AGA GTT CCG AAA ACT CTC GAG CAG AAC TAC GGG 420 

121 Leu Lys Val Tyr He Lys Ser Pro Xle Arg Val Pro Lys Thr Leu Glu Gin Asn Tyr Gly 140 

421 GTC CTC GGC GGT CCT GAA GAT CCC ATC AGA GGA TAC ATA AGA AAA GCC CAG TAT TCG TAC 430 

141 Val Leu Gly Gly Pro Glu Asp Pro He Arg Gly Tyr He Arg Lys Ala Gin Tyr Ser Tyr 160 

481 GGA TGG GAC TGG GGT GCC AGA ATC GTT ACA AGC GGT ATT TGG AAA CCC GTC TAC CTC GAG S40 

161 Gly Trp Asp Trp Gly Ala Arg He Val Thr Ser Gly He Trp Lys Pro Val Tyr Leu Glu 180 

541 GTG TAC AGG GCA CGT CTT CAG GAT TCA ACG GCT TAT CTG TTG GAA CTT GAG GGG AAA GAT 600 

181 Val Tyr Arg Ala Arg Leu Gin Asp Ser Thr Ala Tyr Leu Leu Glu Leu Glu Gly Lys Asp 200 

601 GCC CTT GTG AGG GTG AAC GGT TTC GTA CAC GGG GAA GGA AAT CTC ATT GTG GAA GTT TAT 660 

201 Ala Leu Val Arg Val Asn Gly Phe Val His Gly Glu Gly Asn Leu He Val Glu Val Tyr 220 

661 GTA AAC GGT GAA AAG ATA GGG GAG TTT CCT GTT CTT GAA AAG AAC GGA GAA AAG CTC TTC 72 0 

221 Val Asn Gly Glu LyB lie Gly Glu Phe Pro Val Leu Glu Lys Asn Gly Glu Lys Leu Phe 240 

721 GAT GGA GTG TTC CAC CTG AAA GAT GTG AAA CTA TGG TAT CCG TGG AAC GTG GGG AAA CCG 780 

241 Asp Gly Val Phe His Leu Lys Asp Val Lys Leu Trp Tyr Pro Trp Asn Val Gly Lys Pro 260 
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781 TAG CTG TAC GAT TTC GTT TTC GTG TTG AAA GAC TTA AAC GGA GAG ATC TAC AGA GAA GAA 640 

261 Tyr Leu Tyr Asp Phe Val Phe Val Leu Lys Asp Leu Asn Gly Glu lie Tyr Arg Glu Glu 280 

841 AAG AAA ATC GGT TTG AGA AGA GTC AGA ATC GTT CAG GAG CCC GAT GAA GAA GGA AAA ACT 900 

281 Lys Lys lie Gly Leu Arg Arg Val Arg lie Val Gin Glu Pro Asp Glu Glu Gly Lys Thr 300 

901 TTC ATA TTC GAA ATC AAC GGT GAG AAA GTC TTC GCT AAG GGT GCT AAC TGG ATT CCC TCA 960 

301 Phe lie Phe Glu lie Asn Gly Glu Lys Val Phe Ala Lys Gly Ala Asn Trp lie Pro Ser 320 

961 GAA AAC ATC CTC ACG TGG TTG AAG GAG GAA GAT TAC GAA AAG CTC GTC AAA ATG GCA AGG 1020 

321 Glu Asn lie Leu Thr Trp Leu Lys Glu Glu Asp Tyr Glu Lys Leu Val Lys Met Ala Arg 34Q 

1021 AGT GCC AAT ATG AAC ATG CTC AGG GTC TGG GGA GGA GGA ATC TAC GAG AGA GAG ATC TTC 1080 

341 Ser Ala Asn Met Asn Met Leu Arg Val Trp Gly Gly Gly lie Tyr Glu Arg Glu He Phe 360 

1081 TAC AGA CTC TGT GAT GAA CTC GGT ATC ATG GTG TGG CAG GAT TTC ATG TAC GCG TGT CTT 1140 

361 Tyr Arg Leu Cys Asp Glu Leu Gly He Met Val Trp Gin Asp Phe Met Tyr Ala Cys Leu 3 BO 

1141 GAA TAT CCG GAT CAT CTT CCG TGG TTC AGA AAA CTC GCG AAC GAA GAG GCA AGA AAG ATT 1200 

381 Glu Tyr Pro Asp His Leu Pro Trp Phe Arg Lys Leu Ala Asn Glu Glu Ala Arg Lys He 400 

1201 GTG AGA AAA CTC AGA TAC CAT CCC TCC ATT GTT CTC TGG TGC GGA AAC AAC GAA AAC AAC 1260 

401 Val Arg Lys Leu Arg Tyr His Pro Ser He Val Leu Trp Cys Gly Asn Asn Glu Asn Asn 420 

1261 TGG GGA TTC GAT GAA TGG GGA AAT ATG GCC AGA AAA GTG GAT GGT ATC AAC CTC GGA AAC 1320 

421 Trp Gly Phe Asp Glu Trp Gly Asn Met Ala Arg Lys Val Asp Gly He Asn Leu Gly Asn 440 

1321 AGG CTC TAC CTC TTC GAT TTT CCT GAG ATT TGT GCC GAA GAA GAC CCG TCC ACT CCC TAT 1380 

441 Arg Leu Tyr Leu Phe Asp Phe Pro Glu He Cys Ala Glu Glu Asp Pro Ser Thr Pro Tyr 460 

1381 TGG CCA TCC AGT CCA TAC GGC GGT GAA AAA GCG AAC AGC GAA AAG GAA GGA GAC AGG CAC 1440 

461 Trp Pro Ser Ser Pro Tyr Gly Gly Glu Lys Ala Asn Ser Glu Lys Glu Gly Asp Arg His 460 

1441 GTC TGG TAC GTG TGG AGT GGC TGG ATG AAC TAC GAA AAC TAC GAA AAA GAC ACC GGA AGG 1500 

4 81 Val Trp Tyr Val Trp Ser Gly Trp Met Asn Tyr Glu Asn Tyr Glu Lys Asp Thr Gly Arg 500 

1501 TTC ATC AGC GAG TTT GGA TTT CAG GGT GCT CCC CAT CCA GAG ACG ATA GAG TTC TTT TCA 1560 

501 Phe He Ser Glu Phe Gly Phe Gin Gly Ala Pro His Pro Glu Thr He Glu Phe Phe Ser S20 

1561 AAA CCC GAG GAA AGA GAG ATA TTC CAT CCC GTC ATG CTG AAG CAC AAC AAA CAG GTG GAA 1620 

S21 Lys Pro Glu Glu Arg Glu He Phe His Pro Val Met Leu Lys His Asn Lys Gin Val Glu 540 

Figure discontinued) 
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1621 GGA CAG GAA AGA TTG ATC AGO TTC ATA TTC GGA AAT TTT GGA AAG TGT AAA GAT TTC GAC 16B0 

S41 Gly Gin Glu Arg Leu lit Arg Phe He Phe Gly Asn Phe Gly Lys Cys Lys Asp Phe Asp 560 

1681 ACT TTT GTG TAT CTG TCC CAG CTC AAC CAG GCG GAG GCG ATC AAG TTC GGT GTT GAA CAC 1740 

561 Ser Phe Val Tyr Leu Ser Gin Leu Asn Gin Ala Glu Ala He Lys Phe Gly Val Glu His 560 

1741 TGG CGA AGC AGG AAG TAC AAA ACQ GCC GGC GCT CTC TTC TGG CAG TTC AAC GAC AGC TGG 1900 

581 Trp Arg Ser Arg Lys Tyr Lys Thr Ala Gly Ala Leu Phe Trp Gin Phe Asn Asp Ser Trp 600 

1801 CCG GTC TTC AGC TGG TCC GCA GTC GAT TAC TTC AAA AGG CCC AAA GCT CTC TAC TAC TAT I860 

601 Pro Val Phe Ser Trp Ser Ala Val Asp Tyr Phe Lys Arg Pro Lys Ala Leu Tyr Tyr Tyr 620 

1861 GCG AGA AGA TTC TTC GCT GAA GTT CTA CCC GTT TTG AAG AAG AGA GAC AAC AAA ATA GAA 1920 

621 Ala Arg Arg Phe Phe Ala Glu Val Leu Pro Val Leu Lys Lys Arg Asp Asn Lys He Glu 640 

1921 CTG CTG GTG GGT GAG CGA TCT GAG GGA GAC AAA AGA AGT CTC TCT CAG GCT TGC AGC CTA I960 

641 Leu Leu Val Gly Glu Arg Ser Glu Gly Asp Lya Arg Ser Leu Ser Gin Ala Cys Ser Leu 660 

1981 CGA GAA GAA GGG AGA AAA GGT ATT CGA AAA GAC TTA CAG AAC GGT ACT CCC AGC AGA CGG 2040 

661 Arg Glu Glu Gly Arg Lys Gly He Arg Lya Asp Leu Gin Asn Gly Thr Pro Ser Arg Arg 680 

2041 TGT GAG TTT GGT TGA 2055 

681 Cys Glu Phe Gly End 685 



Figure 16 C( continued) 
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Figure No. lT^Bankia gouldi (37gp4) 

1 ATG AAA AAA AAT CTA CTA ATG TTT AAA AGG CTT ACG TAT CTA CCT TTG TTT TTA ATG CTG 60 

1 Met Lys Lys Asn Leu Leu Met Phe Lys Arg Leu Thr Tyr Leu Pro Leu Phe Leu Met Leu 20 

61 CTC TCA CTA AGT TCA GTA GCT CM TCT CCT GTA GAA AAA CAT GGC CGT TTA CAA GTT GAC 120 

21 Leu Ser Leu Ser Ser Val Ala Gin Ser Pro Val Glu Lys His Gly Arg Leu Gin Val Asp 40 

121 GGA AAC CGC ATT CTT AAT GCG TCT GGA GAA ATT ACG AGC TTA GCT GGT AAC AGC CTC TTT 180 

41 Gly Asn Arg lie Leu Asn Ala Ser Gly Glu He Thr Ser Leu Ala Gly Asn Ser Leu Phe 60 

181 TGG AGT AAT GCT GGA GAC ACC TCC GAT TTT TAT AAT GCA GAA ACT GTT GAT TTT TTA GCA 240 

61 Trp Ser Aan Ala Gly Asp Thr Ser Asp Phe Tyr Asn Ala Glu Thr Val Asp Phe Leu Ala 80 

241 GAA AAC TGG AAT AGC TCA CTT ATT AGA ATA GCT ATG GGC GTA AAA GAA AAT TGG GAT GGC 300 

81 Glu Aan Trp Asn Ser Ser Leu lie Arg lie Ala Met Gly Val Lys Glu Asn Trp Asp Gly 100 

301 GGA AAT GGC TAT ATT GAT AGT CCG CAG GAG CAA GAA GCT AAA ATT AGA AAA GTT ATT GAT 360 

101 Gly Asn Gly Tyr lie Asp Ser Pro Gin Glu Gin Glu Ala Lya lie Arg Lys Val He Asp 120 

361 GCA GCT ATT GCT AAC GGC ATA TAT GTA ATA ATA GAC TGG CAC ACT CAC GAA GCA GAG TTA 420 

121 Ala Ala He Ala Asn Gly He Tyr Val He He Asp Trp His Thr His Glu Ala Glu Leu 140 

421 TAC ACA GAT GAG GCT GTT GAC TTT TTT ACC AGA ATG GCA GAC CTA TAC GGA GAT ACT CCC 480 

141 Tyr Thr Asp Glu Ala Val Asp Phe Phe Thr Arg Met Ala Asp Leu Tyr Gly Asp Thr Pro 160 

481 AAT GTA ATG TAT GAA ATT TAT AAC GAG CCT ATA TAC CAA AGT TGG CCT GTT ATT AAG AAT 540 

161 Asn Val Met Tyr Glu He Tyr Asn Glu Pro He Tyr Gin Ser Trp Pro Val He Lys Asn 180 

541 TAT GCA GAG CAA GTA ATT GCT GGT ATA CGT TCT AAA GAC CCA GAT AAT TTA ATA ATT GTA 600 

181 Tyr Ala Glu Gin Val He Ala Gly He Arg Ser Lys Asp Pro Asp Asn Leu He He Val 200 

601 GGT ACT AGC AAT TAT TCT CAG CAA GTT GAT GTA GCA TCA GCA GAC CCA ATA TCT GAT ACT 660 

201 Gly Thr Ser Asn Tyr Ser Gin Gin Val Asp Val Ala Ser Ala Asp Pro He Ser Asp Thr 220 

661 AAT GTG GCA TAT ACT TTA CAT TTT TAT GCA GCA TTT AAC CCG CAT GAT AAC TTA AGA AAT 720 

221 Asn Val Ala Tyr Thr Leu His Phe Tyr Ala Ala Phe Asn Pro His Asp Asn Leu Arg Asn 240 

721 GTA GCA CAG ACA GCA TTA GAT AAT AAT GTT GCT TTG TTT GTT ACA GAA TGG GGT ACA ATT 780 

241 Val Ala Gin Thr Ala Leu Asp Asn Asn Val Ala Leu Phe Val Thr Glu Trp Gly Thr He 260 
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781 TTA AAT ACC GGA CAA GGA GAA CCA GAC AAA OAA AGC ACT AAT ACT TGG ATG GCC TTT TTG 

261 Leu Asn Thr Gly Gin Gly Glu Pro Asp Lya Glu Ser Thr Asn Thr Trp Met Ala Phe Leu 

841 AAA GAA AAA GOT ATA AGT CAC GCT AAT TGG TCT TTG AGT GAC AAA GCT TTT CCT GAA ACA 

281 Lys Glu Lys Gly lie Ser His Ala Asn Trp Ser Leu Ser Asp Lys Ala Phe Pro Glu Thr 



1501 TCA GAT AAA GGA CAA CAT GAC ACT TAT GAA AGA GCT TGT AAC AAT AAC ACT ATT GAA AAC 



501 



840 
260 



9O0 
300 



901 GGG TCT GTA GTT CAA GCA GGA CAA GGT GTA TCT GGT TTA ATT AGC AAT AAA CTT ACA GCC 
301 Gly Ser Val Val Gin Ala Gly Gin Gly Val ser Gly Leu He Ser Asn Lys Leu Thr Ala 320 



960 



1020 



961 TCT GGT GAA ATT GTA AAA AAC ATC ATC CAA AAC TGG GAT ACA GAG ACC TCT ACA GGA CCT 

321 Ser Gly Glu lie Val Lys Aen He He Gin Asn Trp Asp Thr Glu Thr Ser Thr Gly Pro 340 

1021 AAA ACA ACA CAA TGT AGT ACT ATA GAA TGT ATT AGA GCT GCA ATG GAA ACA GCA CAA GCA 1080 

341 Lys Thr Thr Gin Cys Ser Thr lie Glu Cys lie Arg Ala Ala Met Glu Thr Ala Gin Ala 360 

1081 GGA GAT GAA ATT ATA ATT GCC CCT GGA AAC TAG AAT TTT CAA GAC AAG ATA CAA GGT GCC 1140 

361 Gly Asp Glu He He He Ala Pro Gly Asn Tyr Asn Phe Gin Asp Lys He Gin Gly Ala 380 

1141 TTT AAC CGT AGT GTT TAC CTT TAT GGT AGT GCT AAC GGA AAC AGT ACA AAC CCT ATT ATA 1200 

381 Phe Asn Arg Ser Val Tyr Leu Tyr Gly Ser Ala Asn Gly Asn Ser Thr Asn Pro He He 400 

1201 TTA AGA GGC GAA AGC GCT ACA AAC CCT CCT GTT TTC TCA GGA TTA GAT TAT AAC AAT GGC 1260 

401 Leu Arg Gly Glu Ser Ala Thr Asn Pro Pro Val Phe Ser Gly Leu Asp Tyr Asn Asn Gly 420 

1261 TAC CTA TTA AGT ATT GAA GGT GAT TAT TGG AAT ATT AAA GAT ATA GAG TTT AAA ACT GGG 1320 

421 Tyr Leu Leu Ser lie Glu Gly Asp Tyr Trp Asn He Lys Asp He Glu Phe Lys Thr Gly 440 

1321 TCT AAA GGT ATT GTT CTT GAC AAT TCT AAT GGT AGT AAA TTA AAA AAC CTT GTT GTT CAT 1380 

441 Ser Lys Gly He Val Leu Asp Asn Ser Asn Gly Ser Lys Leu Lys Asn Leu Val Val His 460 

1381 GAT ATT GGA GAA GAA GCT ATT CAC TTG CGT GAT GGA TCT AGC AAT AAT AGT ATA GAT GGT 1440 

461 Asp He Gly Glu Glu Ala He His Leu Arg Asp Gly Ser Ser Asn Asn Ser He Asp Gly 480 

1441 TGC ACT ATA TAC AAT ACA GGT AGA ACT AAA CCT GGT TTT GGT GAA GGT TTA TAT GTA GGC 1500 

481 Cys Thr He Tyr Asn Thr Gly Arg Thr Lys Pro Gly Phe Gly Glu Gly Leu Tyr Val Gly 500 



X560 



Ser Asp Lys Gly Gin His Asp Thr Tyr Glu Arg Ala Cys Asn Asn Asn Thr He Glu Asn 520 



1S61 TGT ACC GTT GGA CCC AAT GTA ACA GCA GAA GGC GTA GAT GTT AAG GAA GGT ACA ATG AAC 1620 
S21 Cys Thr Val Gly Pro Asn Val Thr Ala Glu Gly Val Asp Val Lys Glu Gly Thr Met Asn 540 
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1621 ACT ATT ATA AGA AAT TGC GTG TTT TCT GCA GAA GGA ATT TCA GGA GAA AAT AGC TCA GAT 1580 

541 Thr lie lie Arg Asn Cys Val Phe Ser Ala Glu Gly lie Ser Gly Glu Asn Ser Ser Asp 560 

1681 GCT TTT ATT GAT TTA AAA GGA GCC TAT GGT TTT GTA TAC AGA AAC ACG TTT AAT GTT GAT 1740 

561 Ala Phe lie Asp Leu Lys Gly Ala Tyr Gly Phe Val Tyr Arg Aan Thr Phe Asn Val Asp 580 

1741 GGT TCT GAA GTA ATA AAT ACT GGA GTA GAC TTT TTA GAT AGA GGT ACA GGA TTT AAT ACA 1800 

581 Gly Ser Glu Val lie Asn Thr Gly Val Asp Phe Leu Asp Arg Gly Thr Gly Phe Asa Thr 600 

1801 GGT TTT AGA AAT GCA ATA TTT GAA AAT ACA TAT AAC CTT GGC ACT AGA GCT TCA GAA ATT 1860 

601 Gly Phe Arg Asn Ala lie Phe Glu Asn Thr Tyr Aan Leu Gly Ser Arg Ala Ser GXu lie 620 

1861 TCA ACT GCT CGT AAA AAA CAA GGT TCT CCT GAA CAA ACT CAC GTT TGG GAT AAT ATT AGA 1920 

621 Ser Thr Ala Arg Lys Lys Gin Qly Ser Pro Glu Gin Thr His Val Trp Asp Asn lie Arg 640 

1921 AAC CCT AAT TCT GTT GAT TTT CCA ATA AGT GAT GGT ACA GAA AAT CTA GTA AAT AAA TTC 1980 

641 Asn Pro Aan Ser Val Asp Ph* Pro He Ser Asp Gly Thr Glu Asn Leu Val Asn Lys Phe 660 

1981 TGC CCA GAT TGG AAT ATA GAA CCA TGT AAT CCT GTA GAC GAA ACC AAC CAA GCA CCT ACA 2040 

661 Cya Pro Asp Trp Asn He Glu Pro Cys Asn Pro Val Asp Glu Thr Asn Gin Ala Pro Thr 680 

2041 ATA AGC TTC CTA TCT CCT GTT AAC AAT ATT ACT TTA GTT GAA GGT TAT AAT TTA CAA GTT 2100 

6a 1 He Ser Phe Leu Ser Pro Val Asn Asn He Thr Leu Val Glu Gly Tyr Asn Leu Gin Val 700 

2101 GAA GTT AAT GCT ACT GAT GCA GAT GGA ACT ATT GAT AAT GTA AAA CTT TAT ATA GAT AAC 2160 

701 Glu Val Asn Ala Thr Asp Ala Asp Gly Thr He Asp Asn Val Lys Leu Tyr He Asp Asn 720 

2161 AAT TTA GTT AGG CAA ATA AAT TCT ACT TCA TAT AAA TGG GGC CAT TCT GAT TCT CCA AAT 2220 

721 Asn Leu Val Arg Gin He Asn Ser Thr Ser Tyr Lys Trp Gly His Sex Asp Ser Pro Asn 740 

2221 ACA GAT GAA CTT AAT GGT CTT ACA GAA GGA ACT TAT ACC TTA AAA GCA ATT GCA ACT GAT 2260 

741 Thr Asp Glu Leu Asn Gly Leu Thr Glu Gly Thr Tyr Thr Leu Lys Ala He Ala Thr Asp 760 

2281 AAC GAC GGG GCT TCT ACA GAA ACG CAA TTT ACG TTA ACT GTA ATA ACA GAA CAA AGT CCG 2340 

761 Asn Asp Gly Ala Ser Thr Glu Thr Gin Phe Thr Leu Thr Val He Thr Glu Gin Ser Pro 780 

2341 TCT GAG AAT TGT GAC TTT AAT ACA CCT TCT TCA ACT GGT TTA GAA GAT TTT GAC ATT AAA 2400 

781 Ser Glu Asn Cys Asp Phe Asn Thr Pro Ser Ser Thr Gly Leu Glu Asp Phe Asp He Lys 800 

2401 AAG TTT TCT AAC GTT TTT GAG TTA GGA TCT GGC GGA CCA TCT TTA AGT AAT TTA AAA ACA 2460 
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801 Lys Phe Ser Asn Val Phe Glu Leu Gly Ser Gly Gly Pro Ser Leu Ser Asn Leu Lys Thr 820 

2461 TTT ACT ATT AAT TGG AAT TCG CAA TAC AAT GGG TTA TAT CAA TTT TCA ATA AAC ACA AAC 2520 

821 Phe Thr He Asn Trp Asn Ser Gin Tyr Asn Gly Leu Tyr Gin Phe Ser lie Asn Thr Asn 840 

2521 AAC GGT GTA CCT GAT TAT TAT ATA AAT TTA AAA CCA AAA ATT ACC TTT CAG TTT AAA AAT 2580 

841 Asn Gly Val Pro Asp Tyr Tyr lie Asn Leu Lys Pro Lys He Thr Phe Gin Phe Lys Asn 860 

2581 GCA AAT CCA GAA ATA TCT ATT AGC AAT AGC TTA ATT CCT AAT TTT GAT GGT GAT TAC TGG 2640 

861 Ala Asn Pro Glu lie Ser He Ser Asn Ser Leu He Pro Asn Phe Asp Gly Asp Tyr Trp 880 

2641 GTA ACA TCA GAT AAC GGT AAT TTT GTG ATG GTA TCT AAA ACT AAT AAT TTT ACG ATA TAC 2700 

881 Val Thr Ser Asp Aan Gly Asn Phe Val Met Val Ser Lys Thr Asn Asn Phe Thr He Tyr 900 

2701 TTT AGT AAT GAC GCT ACT GCT CCT ATT TGT AAT GTT ACG CCT AGT AAC CAA ATA AGT AAA 2760 

901 Phe Ser Asn Asp Ala Thr Ala Pro He Cya Asn val Thr Pro Ser Asn Gin He Ser Lys 920 

2761 ATT ACT GAT GAT TCT AGT ATT AAT TTT AAG CTT TAC CCT 2AT CCT GCT TTA GAC GAA ACT 2820 

321 He Thr Asp Asp Ser Ser He Asn Phe Lys Leu Tyr Pro Asn Pro Ala Leu Asp Glu Thr 940 

2821 ATT TTT GTG AGC GCT GAA GAT GAA AAA CTA GCT TTG GTG CTT GTA CCA GT 2870 

941 He Phe Val Ser Ala Glu Asp Glu Lys Leu Ala Leu Val Leu Val Pro 956 
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Figure No. 100w Pyrococcua furiosus VC1(7EG1) 

leader sequence: amino acids 1-24 

9 18 27 36 4S 54 

S 1 ATG AGC AAG AAA AAG TTC GTC ATC GTA TCT ATC TTA ACA ATC CTT TTA GTA CAG 
Met Ser Lys Lys Lys Phe Val He Val Ser He Leu Thr He Leu Leu Val Gin 

63 72 81 90 99 108 

GCA ATA TAT TTT GTA GAA AAG TAT CAT ACC TCT GAG GAC AAG TCA ACT TCA AAT 
Ala He Tyr Phe Val Glu Lys Tyr His Thr Ser Glu Asp Lys Ser Thr Ser Asn 

117 126 135 144 153 162 

ACC TCA. TCT ACA CCA CCC CAA ACA ACA CTT TCC ACT ACC AAG GTT CTC AAG ATT 
Thr Ser Ser Thr Pro Pro Gin Thr Thr Leu Ser Thr Thr Lys Val Leu Lya He 

171 180 189 198 207 216 

AGA TAC CCT GAT GAC GGT GAG TGG CCA GGA GCT CCT ATT GAT AAG GAT GGT GAT 
Arg Tyr Pro Asp Asp Gly Glu Trp Pro Gly Ala Pro He Asp Lys Asp Gly Asp 

225 234 243 252 261 270 

GGG AAC CCA GAA TTC TAC ATT GAA ATA AAC CTA TGG AAC ATT CTT AAT GCT ACT 
Gly Asn Pro Glu Phe Tyr lie Glu He Asn Leu Trp Asn He Leu Asn Ala Thr 

279 288 297 306 315 324 

GGA TTT GCT GAG ATG ACQ TAC AAT TTA ACC AGC GGC GTC CTT CAC TAC GTC CAA 
Gly Phe Ala Glu Met Thr Tyr Asn Leu Thr Ser Gly Val Leu His Tyr Val Gin 

333 ' 342 351 360 369 378 

CAA CTT GAC AAC ATT GTC TTG AGG GAT AGA AGT AAT TGG GTG CAT GGA TAC CCC 
Gin Leu Asp Asn He Val Leu Arg Asp Arg Ser Asn Trp Val His Gly Tyr Pro 

387 396 405 414 423 432 

GAA ATA TTC TAT GGA AAC AAG CCA TGG AAT GCA AAC TAC GCA ACT GAT GGC CCA 
Glu He Phe Tyr Gly Asn Lys Pro Trp Asn Ala Asn Tyr Ala Thr Asp Gly Pro 

441 450 459 468 477 486 

ATA CCA TTA CCC AGT AAA GTT TCA AAC CTA ACA GAC TTC TAT CTA ACA ATC TCC 
He Pro Leu Pro Ser Lys Val Ser Asn Leu Thr Asp Phe Tyr Leu Thr He Ser 
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495 504 513 S22 531 540 

TAT AAA CTT GAG CCC AAG AAC GGC CTG CCA ATT AAC TTC GCA ATA GAA TCC TGG 
Tyr Lys Leu Glu Pro Lys Asn Gly Leu Pro He Asn Phe Ala lie Glu Ser Trp 

549 558 557 576 565 594 

TTA ACG AGA GAA GCT TGG AGA ACA ACA GGA ATT AAC AGC GAT GAG CAA GAA GTA 
Leu Thr Arg Glu Ala Trp Arg Thr Thr Gly lie Asn Ser Asp Glu Gin Glu Val 

603 612 621 630 639 

ATG ATA TGG ATT TAC TAT GAC GGA TTA CAA CCG GCT GGC TCC AAA GTT 
Met He Trp He Tyr Tyr Asp Gly Leu Gin Pro Ala Gly Ser Lys Val 

. 657 666 675 684 653 702 

ATT GTA GTC CCA ATA ATA GTT AAC GGA ACA CCA GTA AAT GCT ACA TTT GAA GTA 
He Val Val Pro He He Val Asn Gly Thr Pro Val Asn Ala Thr Phe Glu Val 

711 720 729 738 747 756 

TGG AAG GCA AAC ATT GGT TGG GAG TAT GTT GCA TTT AGA ATA AAG ACC CCA ATC 
Trp Lys Ala Asn He Gly Trp Glu Tyr Val Ala Phe Arg He Lys Thr Pro He 

765 774 783 792 801 810 

AAA GAG GGA ACA GTG ACA ATT CCA TAC GGA GCA TTT ATA AGT GTT GCA GCC AAC 
Lys Glu Gly Thr Val Thr He Pro Tyr Gly Ala Phe He Ser Val Ala Ala Asn 

819 828 837 846 855 864 

ATT TCA AGC TTA CCA AAT TAC ACA GAA CTT TAC TTA GAG GAC GTG GAG ATT GGA 
He Ser Ser Leu Pro Asn Tyr Thr Glu Leu Tyr Leu Glu Asp Val Glu He Gly 

873 882 891 900 909 918 

ACT GAG TTT GGA ACG CCA AGC ACT ACC TCC GCC CAC CTA GAG TGG TGG ATC ACA 
Thr Glu Phe Gly Thr Pro Ser Thr Thr Ser Ala His Leu Glu Trp Trp He Thr 

927 936 945 954 

AAC ATA ACA CTA ACT CCT CTA GAT AGA CCT CTT ATT TCC TAA 3 ' 
Asn He Thr Leu Thr Pro Leu Asp Arg Pro Leu He Ser * 
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